Nvidia unveils Grace ARM-based CPU for giant-scale AI and HPC apps

Nvidia unveiled its Grace processor today. It's an ARM-based central processing unit (CPU) for giant-scale artificial intelligence and high-performance computing applications. It's Nvidia's first datacenter CPU, purpose-built for applications that are operating on a giant scale, Nvidia CEO Jensen Huang said in a keynote speech at Nvidia's GTC 2021 event.

Grace delivers 10 times the performance leap for systems training giant AI models, using energy-efficient ARM cores. And Nvidia said the Swiss Supercomputing Center and the U.S. Department of Energy's Los Alamos National Laboratory will be the first to use Grace, which is named for Grace Hopper, who pioneered computer programming in the 1950s. The CPU is expected to be available in early 2023.

"Grace is a breakthrough CPU. It's purpose-built for accelerated computing applications of giant scale for AI and HPC," said Paresh Kharya, senior director of product management and marketing at Nvidia, in a press briefing.

Huang said, "It's the world's first CPU designed for terabyte scale computing."

The CPU is the result of more than 10,000 engineering years of work. Nvidia said the chip will address the computing requirements for the world's most advanced applications -- including natural language processing, recommender systems, and AI supercomputing -- that analyze enormous datasets requiring both ultra-fast compute performance and massive memory.

Grace combines energy-efficient ARM CPU cores with an innovative low-power memory subsystem to deliver high performance with great efficiency. The chip will use a future ARM core dubbed Neoverse.

"Leading-edge AI and data science are pushing today's computer architecture beyond its limits -- processing unthinkable amounts of data," Huang said in his speech. "Using licensed ARM IP, Nvidia has designed Grace as a CPU specifically for giant-scale AI and HPC. Coupled with the GPU and DPU, Grace gives us the third foundational technology for computing and the ability to re-architect the datacenter to advance AI. Nvidia is now a three-chip company."

Grace is a highly specialized processor targeting workloads such as training next-generation NLP models that have more than 1 trillion parameters. When tightly coupled with Nvidia GPUs, a Grace-based system will deliver 10 times faster performance than today's Nvidia DGX-based systems, which run on x86 CPUs. In a press briefing, someone asked if Nvidia will compete with x86 chips from Intel and AMD.

Kharya said, "We are not competing with x86 ... we continue to work very well with x86 CPUs."

Grace is designed for AI and HPC applications, but Nvidia isn't disclosing additional information about where Grace will be used today. Nvidia also declined to disclose the number of transistors in the Grace chip.

Nvidia is introducing Grace as the volume of data and size of AI models grow exponentially. Today's largest AI models include billions of parameters and are doubling every two and a half months. Training them requires a new CPU that can be tightly coupled with a GPU to eliminate system bottlenecks.

"The biggest announcement of GTC 21 was Grace, a tightly integrated CPU for over a trillion parameter AI models," said Patrick Moorhead, an analyst at Moor Insights & Strategies. "It's hard to address those with classic x86 CPUs and GPUs connected over PCIe. Grace is focused on IO and memory bandwidth, shares main memory with the GPU and shouldn't be confused with general purpose datacenter CPUs from AMD or Intel."

Underlying Grace's performance is 4th-gen Nvidia NVLink interconnect technology, which provides 900 gigabyte-per-second connections between Grace and Nvidia graphics processing units (GPUs) to enable 30 times higher aggregate bandwidth compared to today's leading servers.

Grace will also utilize an innovative LPDDR5x memory subsystem that will deliver twice the bandwidth and 10 times better energy efficiency compared with DDR4 memory. In addition, the new architecture provides unified cache coherence with a single memory address space, combining system and HBM GPU memory to simplify programmability.

"The Grace platform and its Arm CPU is a big new step for Nvidia," said Kevin Krewell, an analyst at Tirias Research, in an email. "The new design of one custom CPU attached to the GPU with coherent NVlinks is Nvidia's new design to scale to ultra-large AI models that now take days to run. The key to Grace is that using the custom Arm CPU, it will be possible to scale to large LPDDR5 DRAM arrays far larger than possible with high-bandwidth memory directly attached to the GPUs."

Grace will power the world's fastest supercomputer for the Swiss organization. Dubbed Alps, the machine will feature 20 exaflops of AI processing. (This refers to the amount of computing available for AI applications.) That's about 7 times more computation than is available with the 2.8-exaflop Nvidia Seline supercomputer, the leading AI supercomputer today. HP Enterprise will be building the Alps system.

Alps will work on problems in areas ranging from climate and weather to materials sciences, astrophysics, computational fluid dynamics, life sciences, molecular dynamics, quantum chemistry, and particle physics, as well as domains like economics and social sciences, and will come online in 2023. Alps will do quantum chemistry and physics calculations for the Hadron collider, as well as weather models.

"This is a very balanced architecture with Grace and a future Nvidia GPU, which we have not announced yet, to enable breakthrough research on a wide range of fields," Kharya said.

Meanwhile, Nvidia also said that it would make its graphics chips available with Amazon Web Services' Graviton2 ARM-based CPU for datacenters for cloud computing.

With Grace, Nvidia will embark on a mult-year pattern of creating graphics processing units, CPUs, and data processing units (CPUs), and it will alternate between Arm and x86 architecture designs, Huang said.

More