Hugging Face reveals generative AI performance gains with Intel hardware

Nvidia's A100 GPU accelerator has enabled groundbreaking innovations in generative AI, powering cutting-edge research that is reshaping what artificial intelligence can achieve.

But in the fiercely competitive field of AI hardware, others are vying for a piece of the action. Intel is betting that its latest data center technologies — including a new Intel Xeon 4th generation Sapphire Rapids CPU and an AI-optimized Habana Gaudi2 GPU — can provide an alternative platform for machine learning training and inference.

On Tuesday, Hugging Face, an open-source machine learning organization, released a series of new reports showing that Intel's hardware delivered substantial performance gains for training and running machine learning models. The results suggest that Intel's chips could pose a serious challenge to Nvidia's dominance in AI computing.

The Hugging Face data reported that the Intel Habana Gaudi2 was able to run inference 20% faster on the 176 billion-parameter BLOOMZ model than it could on the Nvidia A100-80G. BLOOMZ is a variant of BLOOM (an acronym for BigScience Large Open-science Open-access Multilingual Language Model), which had its first big release in 2022 providing support for 46 different human languages. Going a step further, Hugging Face reported that the smaller 7 billion-parameter version of BLOOMZ will run three times faster than the A100-80G, running on the Intel Habana Gaudi2.

On the CPU side, Hugging Face is publishing data showing the increase in performance for the latest 4th Generation Intel Xeon CPU in comparison to the prior 3rd generation version. According to Hugging Face, Stability AI’s Stable Diffusion text-to-image generative AI model runs 3.8 times faster without any code changes. With some modification, including the use of the Intel Extension for PyTorch with Bfloat16, a custom format for machine learning, Hugging Face said it was able to get nearly a 6.5-times speed improvement. Hugging Face has posted an online demonstration tool to allow anyone to experience the speed difference.

"Over 200,000 people come to the Hugging Face Hub every day to try models, so being able to offer fast inference for all models is super important," Hugging Face product director Jeff Boudier told VentureBeat. "Intel Xeon-based instances allow us to serve them efficiently and at scale."

Of note, the new Hugging Face performance claims for Intel hardware did not do a comparison against the newer Nvidia H100 Hopper-based GPUs. The H100 has only recently become available to organizations like Hugging Face, which, Boudier said, has been able to do only limited testing thus far with it.

Intel's strategy for generative AI is end-to-end

Intel has a focussed strategy for growing the use of its hardware in the generative AI space. It's a strategy that involves both training and inference, not just for the biggest large language models (LLMs) but also for real use cases, from the cloud to the edge.

"If you look at this generative AI space, it's still in the early stages and it has gained a lot of hype with ChatGPT in the last few months," Kavitha Prasad, Intel's VP and GM datacenter, AI and cloud, execution and strategy, told VentureBeat. "But the key thing is now taking that and translating it into business outcomes, which is still a journey that's to be had."

Prasad emphasized that an important part of Intel's strategy for AI adoption is enabling a "build once and deploy everywhere" concept. The reality is that very few companies can actually build their own LLMs. Rather, typically an organization will need to fine-tune existing models, often with the use of transfer learning, an approach that Intel supports and encourages with its hardware and software.

With Intel Xeon-based servers deployed in all manner of environments including enterprises, edge, cloud and telcos, Prasad noted that Intel has big expectations for the wide deployment of AI models.

"Coopetition" with Nvidia will continue with more performance metrics to come

While Intel is clearly competing against Nvidia, Prasad said that in her view it's a "coopetition" scenario, which is increasingly common across IT in general.

In fact, Nvidia is using the 4th Generation Intel Xeon in some of its own products, including the DGX100 that was announced in January.

"The world is going towards a 'coopetition' environment and we are just one of the participants in it," Prasad said.

Looking forward, she hinted at additional performance metrics from Intel that will be "very positive." In particular, the next round of MLcommons MLperf AI benchmarking results are due to be released in early April. She also hinted that more hardware is coming soon, including a Habana Guadi3 GPU accelerator, though she did not provide any details or timeline.

Intel's strategy for generative AI is end-to-end

"Coopetition" with Nvidia will continue with more performance metrics to come

More