What Nvidia’s new MLPerf AI benchmark results really mean

Nvidia released results today against new MLPerf industry-standard artificial intelligence (AI) benchmarks for its AI-targeted processors. While the results looked impressive, it is important to note that some of the comparisons they make with other systems are really not apples-to-apples. For instance, the Qualcomm systems are running at a much smaller power footprint than the H100, and are targeted at market segments similar to the A100, where the test comparisons are much more equitable.

Nvidia tested its top-of-the-line H100 system based on its latest Hopper architecture; its now mid-range A100 system targeted at edge compute; and its Jetson smaller system targeted at smaller individual and/or edge types of workloads. This is the first H100 submission, and shows up to 4.5 times higher performance than the A100. According to the below chart, Nvidia has some impressive results for the top-of-the-line H100 platform.

Inference workloads for AI inference

Nvidia used the MLPerf Inference V2.1 benchmark to assess its capabilities in various workload scenarios for AI inference. Inference is different from machine learning (ML) where training models are created and systems “learn.”

Inference is used to run the learned models on a series of data points and obtain results. Based on conversations with companies and vendors, we at J. Gold Associates, LLC, estimate that the AI inference market is many times larger in volume than the ML training market, so showing good inference benchmarks is critical to success.

Why Nvidia would run MLPerf

MLPerf is an industry standard benchmark series that has broad inputs from a variety of companies, and models a variety of workloads. Included are items such as natural language processing, speech recognition, image classification, medical imaging and object detection.

The benchmark is useful in that it can work across machines from high-end data centers and cloud, down to smaller-scale edge computing systems, and can offer a consistent benchmark across various vendors’ products, even though not all of the subtests in the benchmark are run by all testers.

It can also create scenarios for running offline, single stream or multistream tests that create a series of AI functions to simulate a real-world example of a complete workflow pipeline (e.g., speech recognition, natural language processing, search and recommendations, text-to-speech, etc.).

While MLPerf is accepted broadly, many players feel that running only portions of the test (ResNet is the most common) is a valid indicator of their performance and these results are more generally available than the full MLPerf. Indeed, we can see from the chart that many of the comparison chips do not have test results in other components of MLPerf for comparison to the Nvidia systems, as the vendors chose not to create them.

Is Nvidia ahead of the market?

The real advantage Nvidia has over many of its competitors is in its platform approach.

While other players offer chips and/or systems, Nvidia has built a strong ecosystem that includes the chips, associated hardware and a full stable of software and development systems that are optimized for their chips and systems. For instance, Nvidia has built tools like their Transformer Engine that can optimize the level of floating-point calculation (such as FP8, FP16, etc.) at various points in the workflow that is best for the task at hand, which has the potential to accelerate the calculations, sometimes by orders of magnitude. This gives Nvidia a strong position in the market as it enables developers to focus on solutions rather than trying to work on low-level hardware and related code optimizations for systems without the corresponding platforms.

Indeed, competitors Intel, and to a lesser extent Qualcomm, have emphasized the platform approach, but the startups generally only support open-source options that may not be at the same level of capabilities as the major vendors provide. Further, Nvidia has optimized frameworks for specific market segments that provide a valuable starting point from which solution providers can achieve faster time-to-market with reduced efforts. Start-up AI chip vendors can’t offer this level of resource.

The power factor

The one area that fewer companies test for is the amount of power that is required to run these AI systems. High-end systems like the H100 can require 500-600 watts of power to run, and most large training systems use many H100 components, potentially thousands, within their complete system. The operating cost of such large systems is extremely high as a result.

The lower-end Jetson consumes only about 50-60 watts, which is still too much for many edge computing applications. Indeed, the major hyperscalers (AWS, Microsoft, Google) all see this as an issue and are building their own power-efficient AI accelerator chips. Nvidia is working on lower-power chips, particularly since Moore’s Law provides power reduction capability as the process nodes get smaller.

However, it needs to achieve products in the 10 watt and below range if it wants to fully compete with newer optimized edge processors coming to market, and companies with lower power credentials like Qualcomm (and ARM, generally). There will be many low-power uses for AI inference in which Nvidia currently cannot compete.

Nvidia’s benchmark bottom line

Nvidia has shown some impressive benchmarks for its latest hardware, and the test results show that companies need to take Nvidia’s AI leadership seriously. But it’s also important to note that the potential AI market is vast and Nvidia may not be a leader in all segments, particularly in the low-power segment where companies like Qualcomm may have an advantage.

While Nvidia shows a comparison of its chips to standard Intel x86 processors, it does not have a comparison to Intel’s new Habana Gaudi 2 chips, which are likely to show a high level of AI compute capability that could approach or exceed some Nvidia products.

Despite these caveats, Nvidia still offers the broadest product family and its emphasis on complete platform ecosystems puts it ahead in the AI race, and will be hard for competitors to match.