Nvidia launches certification program for AI platforms

Nvidia today launched a formal certification program for systems based on graphical processor units (GPUs) deployed in on-premises IT environments by its OEM partners. The Nvidia-Certified Systems initiative comes at a time when AI models are starting to be both trained and deployed at the network edge rather than trained only in the cloud.

Initial participants in the program include Dell Technologies, Hewlett-Packard Enterprise (HPE), Super Micro, Gigabyte, and Inspur Technologies. Each of the systems provided by these vendors incorporates both Nvidia GPUs and storage switches the company gained via its acquisition of Mellanox. Certified systems will be eligible for support from Nvidia that will be provided via the company's OEM partners.

The training of AI models is moving closer to the network edge because enterprise IT organizations are realizing that as new data sources become available, it's not practical to entirely retrain an AI model in the cloud and then update an inference engine running at the edge. Organizations will need to be able to update AI models as they are running, for example, an application that incorporates computer vision at the network edge, Nvidia executive Adel El Hallak said.

"You need to be able to bring the model to where the action is taking place," El Hallak said.

Nvidia cites American Express as an example of an organization that is already employing AI models in real time to identify fraud, while Ford is incorporating data in AI models in real time to test self-driving cars. Domino's Pizza, meanwhile, is applying AI in real time to improve predictions for when orders will be ready for customers.

IT organizations will need to be able to continuously update AI models without having to transfer a massive amount of data over a wide area network, El Hallak added. In fact, the need to continuously train and update AI models is driving the rise of machine learning operations (MLOps) as a distinct IT discipline, El Hallak noted.

There are, of course, multiple classes of edge computing platforms. Nvidia is in the process of acquiring Arm for $40 billion as part of an effort to extend the reach of its AI software out to not just edge computing platforms but also individual devices. As more platforms and devices incorporate AI models, the new data they are exposed to will create a virtuous cycle through which those AI models will be able to more easily adapt to changing conditions, El Hallak added.

Longer-term, El Hallak said Nvidia expects to extend the Nvidia-Certified Systems initiative to include platforms based on its GPUs that might not necessarily be connected to a Mellanox switch. In the meantime, Nvidia is committed to working closely with OEM partners that are willing to pay for the support it provides their end customers, El Hallak noted.

It may be a while before AI models and the systems they run on are pervasively deployed at the network edge and beyond. However, it is clear AI will be incorporated into almost every edge computing platform to one degree or another. In most cases, those edge computing platforms will be running some type of inference engine based on an AI model that was trained in the cloud. The current processes employed to update inference engines by retraining AI models residing on cloud computing platforms are in many cases simply too slow and cumbersome. Applications increasingly need to process and analyze data in real time as it is either being created or collected on an edge computing device or platform.

More