Nvidia launches AI foundry service for Microsoft Azure with new Nemotron-3 8B models

Nvidia is strengthening its co-sell strategy with Microsoft. Today, at the Ignite conference hosted by the Satya Nadella-led giant, the chipmaker announced an AI foundry service that will help enterprises and startups build custom AI applications on the Azure cloud, including those that can tap enterprise data with retrieval augmented generation (RAG).

“Nvidia’s AI foundry service combines our generative AI model technologies, LLM training expertise and giant-scale AI factory. We built this in Microsoft Azure so enterprises worldwide can connect their custom model with Microsoft’s world-leading cloud services,” Jensen Huang, founder and CEO of Nvidia, said in a statement.

Nvidia also announced new 8-billion parameter models – also a part of the foundry service – as well as the plan to add its next-gen GPU to Microsoft Azure in the coming months.

How will the AI foundry service help on Azure?

With Nvidia’s AI foundry service on Azure, enterprises using the cloud platform will get all key elements required to build a custom, business-centered generative AI application at one place. This means everything will be available end-to-end, right from the Nvidia AI foundation models and NeMo framework to the Nvidia DGX cloud supercomputing service.

“For the first time, this entire process with all the pieces that are needed, from hardware to software, are available end to end on Microsoft Azure. Any customer can come and do the entire enterprise generative AI workflow with Nvdia on Azure. They can procure the required components of the technology right within Azure. Simply put, it's a co-sell between Nvidia and Microsoft,” Manuvir Das, the VP of enterprise computing at Nvidia, said in a media briefing.

To provide enterprises with a wide range of foundation models to work with when using the foundry service in Azure environments, Nvidia is also adding a new family of Nemotron-3 8B models that support the creation of advanced enterprise chat and Q&A applications for industries such as healthcare, telecommunications and financial services. These models will have multilingual capabilities and are set to become available via Azure AI model catalog as well as via Hugging Face and the Nvidia NGC catalog.

Other community foundation models in the Nvidia catalog are Llama 2 (also coming to Azure AI catalog), Stable Diffusion XL and Mistral 7b.

Once a user has access to the model of choice, they can move to the training and deployment stage for custom applications with Nvidia DGX Cloud and AI Enterprise software, available via Azure marketplace. The DGX Cloud features instances customers can rent, scaling to thousands of NVIDIA Tensor Core GPUs, for training and includes the AI Enterprise toolkit, which brings the NeMo framework and Nvidia Triton Inference Server to Azure’s enterprise-grade AI service, to speed LLM customization.

This toolkit is also available as a separate product on the marketplace, Nvidia said while noting that users will be able to use their existing Microsoft Azure Consumption Commitment credits to take advantage of these offerings and speed model development.

Notably, the company had also announced a similar partnership with Oracle last month, giving eligible enterprises an option to purchase the tools directly from the Oracle Cloud marketplace and start training models for deployment on the Oracle Cloud Infrastructure (OCI).

Currently, software major SAP, Amdocs and Getty Images are among the early users testing the foundry service on Azure and building custom AI applications targeting different use cases.

What’s more from Nvidia and Microsoft?

Along with the service for generative AI, Microsoft and Nvidia also expanded their partnership for the chipmaker’s latest hardware.

Specifically, Microsoft announced new NC H100 v5 virtual machines for Azure, the industry’s first cloud instances featuring a pair of PCIe-based H100 GPUs connected via Nvidia NVLink, with nearly four petaflops of AI compute and 188GB of faster HBM3 memory.

The Nvidia H100 NVL GPU can deliver up to 12x higher performance on GPT-3 175B over the previous generation and is ideal for inference and mainstream training workloads.

In addition, the company plans to add the new Nvidia H200 Tensor Core GPU to its Azure fleet next year. This offering brings 141GB of HBM3e memory (1.8x more than its predecessor) and 4.8 TB/s of peak memory bandwidth (a 1.4x increase), serving as a purpose-built solution to run the largest AI workloads, including generative AI training and inference.

It will join Microsoft’s new Maia 100 AI accelerator, giving Azure users multiple options to choose from for AI workloads.

Finally, to accelerate LLM work on Windows devices, Nvidia announced a bunch of updates, including an update for TensorRT LLM for Widows, which introduces support for new large language models such as Mistral 7B and Nemotron-3 8B.

The update, set to release later this month, will also deliver five times faster inference performance which will make running these models easier on desktops and laptops with GeForce RTX 30 Series and 40 Series GPUs with at least 8GB of RAM.

Nvidia added TensorRT-LLM for Windows will also be compatible with OpenAI’s Chat API through a new wrapper, enabling hundreds of developer projects and applications to run locally on a Windows 11 PC with RTX, instead of in the cloud.

How will the AI foundry service help on Azure?

What’s more from Nvidia and Microsoft?

More