Nvidia and Intel unveil advanced HPC initiatives, bolstering AI capabilities at SC2023

The world's fastest supercomputers are getting faster with both Nvidia and Intel racing to accelerate the most powerful computing systems on Earth, with a big emphasis on AI power.

At the Supercomputing 2023 (SC23) conference in Denver today, the list of the world's fastest 500 supercomputers was released. In one form or another, all the systems have components from Nvidia or Intel, and in many cases both. The event is also a platform to talk about the next generation of supercomputers that are being built, what technologies they use and how they will be used.

For Nvidia, the big new system it is part of is the JUPITER supercomputer hosted at the Forschungszentrum Jülich facility in Germany. JUPITER will have 24,000 Nvidia GH200 chips and when completed, will be the most powerful AI supercomputer ever built according to Nvidia with more than 90 exaflops of performance for AI training. Nvidia is also using the event to detail a series of new innovations and AI acceleration silicon including the H200 and a quad configuration for the Grace Hopper GH200 superchip.

Not to be outdone, Intel is highlighting its work on the Aurora supercomputer at the Department of Energy's Argonne National Laboratory that is being used to build a 1 Trillion (that’s with a T and not a typo) parameter large language model (LLM). Intel is also providing new insights into the next generation of AI acceleration and GPU technology as it ups the competitive ante against rival Nvidia.

Nvidia advances Grace Hopper superchip to build the most powerful AI system in history

Nvidia first announced that the Grace Hopper superchip, which combines CPU and GPU capabilities entered full production in May. Those chips have now found their way into the most powerful supercomputers.

"With the introduction of Grace Hopper, a new wave of AI supercomputers are emerging," Dion Harris, director of accelerated data center product solutions at Nvidia said in a briefing with press and analysts.

The Grace Hopper GH200 powers the JUPITER supercomputer, which Nvidia sees as a new class of AI supercomputers. The AI power of the JUPITER will be used for weather prediction, drug discovery and industrial engineering use cases. JUPITER is being built in collaboration with Nvidia, ParTec, Eviden and SiPearl.

JUPITER is using a new configuration for the GH200 that dramatically delivers more performance. The system uses a quad GH200 architecture, which as the name implies, uses four GH200's in a system node.

"The quad GH200 features an innovative node architecture with 288 Neoverse ARM cores capable of achieving 16 Petaflops of AI performance with 2.5 terabytes a second of high-speed memory," Harris explained. "The four-way system is connected with high-speed NV link connections to the chip allowing for full coherence across the architecture."

In total, the system comprises 24,000 GH200 chips that are connected via Nvidia's Quantum-2 InfiniBand networking. The JUPITER isn't the only system that will use the quad GH200 approach. Nvidia will be using the same approach in other supercomputers as well.

As part of the SC23 news, Nvidia is also announcing the standalone H200 silicon. While the GH200 integrates both CPU and GPU, the H200 is just a discrete GPU. The NVIDIA H200 will be offered on Nvidia HGX H200 server boards.

"The HGX H200 platform with faster and more high-speed memory will deliver incredible performance for HPC and AI inference workloads," Harris said.

Intel GPU efforts continue to advance supercomputer powers

Intel is also making a very strong showing at SC23 with its HPC and AI technologies.

In a briefing with press and analysts, Ogi Brkic, VP and General Manager, data center and AI/HPC solutions category at Intel, detailed his company's efforts for AI and HPC acceleration.

Brkic highlighted the Intel Data Center GPU Max series and Intel Habana Gaudi 2 accelerator as helping to lead the way for large supercomputing installations like the Dawn Phase 1 supercomputer at the University of Cambridge in the U.K. The Dawn system, which is currently in phase 1, is the fastest AI supercomputer in the U.K. and includes 512 Intel Xeon CPUs and 1,024 Intel Data Center GPU Max Series GPUs.

Aurora, which is being built in the U.S. by Intel, HP Enterprise, and the U.S. Department of Energy will be helping to develop one of the largest large language models (LLMs) in existence. Brkic said that AuroraGPT is a 1 trillion parameter LLM for science research. AuroraGPT is currently being trained across 64 nodes of Aurora, with the target being to eventually scale it to the entire supercomputer which has over 10,000 nodes.

"We've worked with Microsoft Deepspeed optimizations to ensure that this 1 trillion parameter LLM is available for everybody to use," Brkic said. "The potential applications for this type of large language model are incredible, every element of science from biology, chemistry, drug research, cosmology and so on, can be impacted by availability of this generative model."

Nvidia advances Grace Hopper superchip to build the most powerful AI system in history

Intel GPU efforts continue to advance supercomputer powers

More