Google dives into the 'supercomputer' game by knitting together purpose-built GPUs for large language model training

AI scientists and anyone with very big computation needs will now be able to turn to Google’s cloud to rent machines that may deliver as much as 26 exaFLOPs. The new cloud offerings, detailed at today’s keynote speech at Google I/O 2023, are options that resurrect that Cold War era nomenclature of “supercomputers” because of their extraordinary capabilities and focus on very big tasks.

The new machines are built by combining Nvidia’s H100 GPUs with Google’s own high-speed interconnections. The company expects that the combination of fast GPUs linked by fast data pathways will be very attractive for AI tasks like training very large language models.

Very large language models

The rise of these very large models is reigniting interest in hardware that can efficiently handle very large workloads. AI scientists have seen the most jaw-dropping results when they stretch the size of the model as large as possible. New machines like this will make it easier to push them bigger and bigger.

Google’s new machines are attractive because they’re able to accelerate communications between the GPUs, which will, in turn, accelerate the convergence of the model as it is trained. The Nvidia GPUs will communicate using what Google describes as “custom-designed 200-Gbps IPUs” that offer “GPU-to-GPU data transfers bypassing the CPU host and flowing over separate interfaces from other VM networks and data traffic.” The company estimates that the data will flow between the GPUs 10 times faster than some of their earlier hardware with more traditional communications paths.

Many of the cloud services offer some machines that deliver the highly parallel performance of the GPU or TPU. Amazon’s Web Services, for example, offers a half-dozen different options that combine several GPUs or some of their new ARM-based Graviton chips. Google itself offers their own chips, dubbed TPUs, in a number of combinations.

At the same time, regular GPUs are becoming commonplace. Even some of the smaller clouds like Vultr have GPUs for rent, something that they offer at rates as low as 13 cents per hour for a fraction of a machine.

Google is clearly aiming at the biggest workloads with this announcement. Its new machines, labeled the A3, will bundle up to 8 H100 GPUs from Nvidia built with the video processor manufacturer’s HOPPER architecture. Each machine may also have up to 2 terabytes of RAM for storing the training data. All of this will be synchronized by a fourth-generation Xeon processor.

Google is part of a bigger game

Google is not the only company headed down this path. In November, Microsoft announced a partnership with Nvidia to produce their own “supercomputer.” The company will also be using chips like the H100 as building blocks for interconnected “fabrics” or “meshes” optimized for training these very large models.

In February, IBM announced it is also building its own version dubbed “Vela” that can train very large models for some of its government customers like NASA. These “foundation models” will help with many sciences like drug discovery or cybersecurity.

Another big goal for Google will be integrating this new hardware with its software and cloud offerings. OpenAI, for instance, resells Azure’s computation by making it possible for its own users to fine-tune their own foundational models.

Google says the hardware will be available through Vertex AI for customers “looking to develop complex ML models without the maintenance.” At the same time, they are also announcing expanded features and more foundational models.

Very large language models

Google is part of a bigger game

More