New Microsoft AI chip no threat to Nvidia, but growing LLM needs drive custom silicon

Microsoft has been developing a new artificial intelligence (AI) chip, internally code-named Athena, since as early as 2019, according to reporting from The Information today. The company could make Athena widely available for use within the company itself and OpenAI as early as next year.

Experts say Nvidia won't be threatened by these moves — but it does signal the need for hyperscalers to develop their own custom silicon.

AI chip development in response to a GPU crisis

The chip, like those developed in-house by Google (TPU) and Amazon (Trainium and Inferentia processor architectures), is designed to handle large language model (LLM) training. That is essential because the scale of advanced generative AI models is growing faster than the compute capabilities needed to train them, Gartner analyst Chirag Dekate told VentureBeat by email.

Nvidia is the market leader by a mile when it comes to supplying AI chips, with about 88% market share, according to John Peddie Research. Companies are vying just to reserve access to the high-end A100 and H100 GPUs that cost tens of thousands of dollars each — causing what could be described as a GPU crisis.

"Leading-edge generative AI models are now using hundreds of billions of parameters requiring exascale computational capabilities," he explained. "With next-generation models ranging in trillions of parameters, it is no surprise that leading technology innovators are exploring diverse computational accelerators to accelerate training while reducing the time and cost of training involved."

As Microsoft seeks to accelerate its generative AI strategy while cutting costs, it makes sense that the company develop a differentiated custom AI accelerator strategy, he added, which "could help them deliver disruptive economies of scale beyond what is possible using traditional commoditized technology approaches."

Custom AI chips address the need for inference speed

The need for acceleration also applies, importantly, to AI chips that support machine learning inference — that is, when a model is boiled down to a set of weights that then use live data to produce actionable results. Compute infrastructure is used for inference every time ChatGPT generates responses to natural language inputs, for example.

Nvidia produces very powerful, general-purpose AI chips and offers its parallel computing platform CUDA (and it derivatives) as a way to do ML training specifically, said analyst Jack Gold, of J Gold Associates, in an email to VentureBeat. But inference generally requires less performance, he explained, and the hyperscalers see a way to also impact the inference needs of their customers with customized silicon.

"Inference ultimately will be a much larger market than ML, so it’s important for all of the vendors to offer products here," he said.

Microsoft's Athena not much of a threat to Nvidia

Gold said he doesn't see Microsoft's Athena as much of a threat to Nvidia’s place in AI/ML, where it has dominated since the company helped power the deep learning "revolution" of a decade ago; built a powerhouse platform strategy and software-focused approach; and seen its stock rise in an era of GPU-heavy generative AI.

"As needs expand and diversity of use expands as well, it’s important for Microsoft and the other hyperscalers to pursue their own optimized versions of AI chips for their own architectures and optimized algorithms (not CUDA-specific)," he said.

It’s about cloud operating costs, he explained, but also about providing lower-cost options for diverse customers who may not need or want the high-cost Nvidia option. "I expect all of the hyperscalers to continue to develop their own silicon, not just to compete with Nvidia, but also with Intel in general-purpose cloud compute."

Dekate also maintained that Nvidia shows no signs of slowing down. "Nvidia continues to be the primary GPU technology driving extreme-scale generative AI development and engineering," he said. "Enterprises should expect Nvidia to continue building on its leadership-class innovation and drive competitive differentiation as custom AI ASICs emerge."

But he pointed out that "innovation in the last phase of Moore’s law will be driven by heterogenous acceleration comprising GPUs and application-specific custom chips." This has implications for the broader semiconductor industry, he explained, especially "technology providers that have yet to meaningfully engage in addressing the needs of the rapidly evolving AI market."

AI chip development in response to a GPU crisis

Custom AI chips address the need for inference speed

Microsoft's Athena not much of a threat to Nvidia

More