MLPerf Inference 3.0 results show 30% performance gain across multiple vendors

As the demands for artificial intelligence (AI) and machine learning (ML) continue to grow, there is a corresponding need for even higher levels of performance for both training and inference.

One of the best ways the AI/ML industry has today for measuring performance is with the MLPerf set of testing benchmarks, which have been developed by the multi-stakeholder MLCommons organization. Today, MLCommons released its exhaustive MLPerf Inference 3.0 benchmarks, marking the first major update for the scores since the MLPerf Inference 2.1 update in September 2022.

Across more than 5,000 different performance results, the new results show marked improvement gains for nearly all inference hardware capabilities, across a variety of models and approaches for measuring performance.

Among the vendors that participated in the MLPerf Inference 3.0 effort are Alibaba, ASUS, Azure, cTuning, Deci, Dell, GIGABYTE, H3C, HPE, Inspur, Intel, Krai, Lenovo, Moffett, Nettrix, Neuchips, Neural Magic, Nvidia, Qualcomm, Quanta Cloud Technology, rebellions, SiMa, Supermicro, VMware and xFusion.

MLCommons is also providing scores for power utilization, which is becoming increasingly important as AI inference gains wider deployment. "Our goal is to make ML better for everyone and we really believe in the power of ML to make society better," David Kanter, executive director at MLCommons, said during a press briefing. "We get to align the whole industry on what it means to make ML faster."

How MLPerf looks at inference

There is a significant amount of complexity to the MLPerf Inference 3.0 scores across the various categories and configuration options.

In a nutshell, though, Kanter explained that the way MLPerf Inference scores work is that organizations start with a dataset: for example, a collection of images in a trained model. MLCommons then requires participating organizations to perform inference with a specific level of accuracy.

The core tasks that the MLPerf Inference 3.0 suite looks at are: recommendation, speech recognition, natural language processing (NLP), image classification, object detection and 3D segmentation. The categories in which inference is measured include directly on a service, as well as over a network, which Kanter said more likely models data center deployments.

"MLPerf is a very flexible tool because it measures so much," Kanter said.

Key MLPerf Inference 3.0 trends

Across the dizzying array of results spanning vendors and myriad combinations of hardware and software, there are a number of key trends in this round’s results.

The biggest trend is the staggering performance gains made by vendors across the board in less than a year.

Kanter said they saw in many cases “30% or more improvement in some of the benchmarks since last round." However, he said, comparing the results across vendors can be difficult because they’re “scalable and we have systems everywhere from the 10 or 20 W range up to the 2 KW range."

Some vendors are seeing much more than 30% gains; notably among them is Nvidia. Dave Salvator, director of product marketing at Nvidia, highlighted gains that his company reported for its now-available H100 GPUs. Specifically, Salvator noted that there was a 54% performance gain on the RetinaNet object detection model.

Nvidia had actually submitted results for the H100 in 2022, before it was generally available, and has improved on its results with software optimizations.

"We're basically submitting results on the same hardware," Salvator said. "Through the course of the product life cycle, we typically take up about another 2 times of performance over time" using software enhancements.

Intel is also reporting better-than-average gains for its hardware. Jordan Plawner, senior director of Intel AI products highlighted the 4th generation Intel Xeon Scalable Processor and its integrated accelerator called AMX (advanced matrix extensions). Like Nvidia, Intel had also previously submitted preliminary results for its silicon that have now been improved.

"In the first submission, it was really us just getting AMX and to build upon Nvidia's point, now we're actually tuning and improving the software," Plawner said. "We see across-the-board performance improvement on all models between 1.2 and 1.4x, just in a matter of a few months.”

Also like Nivida, Plawner said that Intel expects to see another 2 times performance increase with the current generation of its hardware after further software improvements.

"We all love Moore's law at Intel, but the only thing better than Moore's law is actually what software can give you over time within the same silicon."

How MLPerf looks at inference

Key MLPerf Inference 3.0 trends

More