At the International Supercomputing Conference (ISC) in Frankfurt, Germany this week, Santa Clara-based chipmaker Nvidia announced that it will support processors architected by British semiconductor design company Arm. Nvidia anticipates that the partnership will pave the way for supercomputers capable of “exascale” performance — in other words, of completing at least a quintillion floating point computations (“flops”) per second, where a flop equals two 15-digit numbers multiplied together.

Nvidia says that by 2020 it will contribute its full stack of AI and high-performance computing (HPC) software to the Arm ecosystem, which by Nvidia’s estimation now accelerates over 600 HPC applications and machine learning frameworks. Among other resources and services, it will make available CUDA-X libraries, graphics-accelerated frameworks, software development kits, PGI compilers with OpenACC support, and profilers.

Nvidia founder and CEO Jensen Huang pointed out in a statement that, thanks to this commitment, Nvidia will soon accelerate all major processor architectures: x86, IBM’s Power, and Arm.

“As traditional compute scaling has ended, the world’s supercomputers have become power constrained,” said Huang. “Our support for Arm, which designs the world’s most energy-efficient CPU architecture, is a giant step forward that builds on initiatives Nvidia is driving to provide the HPC industry a more power-efficient future.”

This is hardly Nvidia’s first collaboration with Arm. The former’s AGX platform incorporates Arm-based chips, and its Deep Learning Accelerator (NVDLA) — a modular, scalable architecture based on Nvidia’s Xavier system-on-chip — integrates with Arm’s Project Trillium, a platform that aims to bring deep learning inferencing to a broader set of mobile and internet of things (IoT) devices.

If anything, today’s news highlights Nvidia’s concerted push into an HPC market that’s forecast to be worth $59.65 billion by 2025. To this end, the chipmaker recently worked with InfiniBand and ethernet interconnect supplier Mellanox to optimize processing across supercomputing clusters, and it continues to invest heavily in 3D packaging techniques and interconnect technology (like NVSwitch) that allow for dense scale-up nodes.

“We have been a pioneer in using Nivida [graphics cards] on large-scale supercomputers for the last decade, including Japan’s most powerful ABCI supercomputer,” said Satoshi Matsuoka, director at Riken, a large scientific research institute in Japan. “At Riken R-CCS [Riken Center for Computational Science], we are currently developing the next-generation, Arm-based exascale Fugaku supercomputer and are thrilled to hear that Nvidia’s GPU acceleration platform will soon be available for Arm-based systems.”

Nvidia has notched a few wins already. Last fall, the TOP500 ranking of supercomputer performance (based on LINPACK score) showed a 48% jump year-over-year in the number of systems using the company’s GPU accelerators, with the total number climbing to 127, or 3 times greater than five years prior. Two of the world’s fastest supercomputers made the list — the U.S. Department of Energy’s Summit at Oak Ridge National Laboratory and Sierra at Lawrence Livermore National Lab — and others featured Nvidia’s DGX-2 Pod, which combines 36 DGX-2 systems and delivers more than 3 petaflops of double-precision performance.

DGX-2 was announced in March 2018 at Nvidia’s GPU Technology Conference in Santa Clara and boasts 300 processors capable of delivering two petaflops of computational power while occupying only 15 racks of datacenter space. It complements HGX-2, a cloud server platform equipped with 16┬áTesla V100 graphics processing units that collectively provide half a terabyte of memory and two petaflops of compute power.

CUDA-X HPC

Today Nvidia also announced the broad launch of CUDA-X HPC, a collection of compilers, software, and APIs built on top of CUDA, its parallel computing platform and programming model. It’s available from Nvidia’s developer portal and in the Nvidia NGC software hub in containerized software stacks.

CUDA-X HPC includes components tailor-made for HPC applications like computational physics, chemistry, molecular dynamics, and seismic exploration, specifically hardware-accelerated linear algebra, parallel algorithms, and signal and image-processing libraries. BLAS, Math, and SOLVER are present and accounted for, in addition to tools for optimized tensor primitives (cuTENSOR), fast fourier transforms (cuFFT), multi-component scaling (NCCL), and more.

CUDA-X HPC’s compilers support popular languages such as C/C++, Python, and FORTRAN, and apps built on CUDA-X HPC can be deployed on a range of hardware, from IoT devices and desktops to datacenters and supercomputers.

DGX SuperPod

Alongside the partnership and CUDA-X HPC announcements this morning, Nvidia revealed what it claims is the world’s 22nd-fastest supercomputer: the DGX SuperPod. VP of AI infrastructure Clement Farabet says it will accelerate the company’s autonomous vehicle development.

Nvidia SuperPod

Above: The Nvidia DGX SuperPod.

Image Credit: Nvidia

“AI leadership demands leadership in compute infrastructure,” said Farabet. “Few AI challenges are as demanding as training autonomous vehicles, which requires retraining neural networks tens of thousands of times to meet extreme accuracy needs. There’s no substitute for massive processing capability like that of the SuperPod.”

The SuperPod contains 96 DGX-2H units and 1,536 V100 Tensor Core graphics chips in total, interconnected with Mellanox and Nvidia’s NVSwitch technologies. It’s about 400 times smaller than comparable top-ranked supercomputing systems and takes as little as three weeks to assemble, while delivering 9.4 petaflops of computing performance. In real-world tests, it managed to train the benchmark AI model ResNet-50 in less than two minutes.

Customers can buy the SuperPod whole or in part from any of Nvidia’s DGX-2 partners.