Cerebras' CS-2 brain-scale chip can power AI models with 120 trillion parameters

Cerebras Systems said its CS-2 Wafer Scale Engine 2 processor is a "brain-scale" chip that can power AI models with more than 120 trillion parameters.

Parameters are the part of a machine-learning algorithm that's learned from historical training data in a neutral network. The more parameters, the more sophisticated the AI model. And that's why Cerebras believes its latest processor -- which is actually built on a wafer instead of just individual chips -- is going to be so powerful, founder and CEO Andrew Feldman said in an interview with VentureBeat.

Feldman, who also founded SeaMicro, gave us a preview of his talk at the Hot Chips semiconductor design conference that is being held online today. The 120 trillion-parameter news follows an announcement by Google researchers back in January that they had trained a model with a total of 1.6 trillion parameters. Feldman noted that Google had raised the number of parameters about 1,000 times in just two years.

"The number of parameters, the amount of memory necessary, have grown exponentially," Feldman said. "We have 1,000 times larger models requiring more than 1,000 times more compute, and that has happened in two years. We are announcing our ability to support up to 120 trillion parameters, to cluster 192 CS-2s together. Not only are we building bigger and faster clusters, but we're making those clusters more efficient."

Feldman said the tech will expand the size of the largest AI neural networks by 100 times.

"Larger networks, such as GPT-3, have already transformed the natural language processing (NLP) landscape, making possible what was previously unimaginable," he said. "The industry is moving past a trillion-parameter models, and we are extending that boundary by two orders of magnitude, enabling brain-scale neural networks with 120 trillion parameters."

Feldman said the Cerebras CS-2 is powered by the Wafer Scale Engine (WSE-2), the largest chip ever made and the fastest AI processor to date. Purpose-built for AI work, the 7-nanometer WSE-2 has 2.6 trillion transistors and 850,000 AI-optimized cores. By comparison, the largest graphics processing unit has only 54 billion transistors, 2.55 trillion fewer transistors than the WSE-2. The WSE-2 also has 123 times more cores and 1,000 times more high-performance on-chip memory than graphic processing unit competitors.

The Cerebras wafers

The Cerebras CS-2 is a monster-sized chip.

The CS-2 is built for supercomputing tasks, and it's the second time since 2019 that Los Altos, California-based Cerebras has unveiled a chip that is basically an entire wafer.

Chipmakers normally slice a wafer from a 12-inch-diameter ingot of silicon to process in a chip factory. Once processed, the wafer is sliced into hundreds of separate chips that can be used in electronic hardware.

But Cerebras takes that wafer and makes a single, massive chip out of it. Each piece of the chip, dubbed a core, is interconnected in a sophisticated way to other cores. The interconnections are designed to keep all the cores functioning at high speeds so the transistors can work together as one. AI was used to design the actual chip itself, Synopsys CEO Aart De Geus said in an interview with VentureBeat.

Cerebras puts these wafers in a typical datacenter computing rack and connects them all together.

Brain-scale computing

The Cerebras CS-2 can fill out a datacenter rack.

To make a comparison, Feldman noted that the human brain contains on the order of 100 trillion synapses. As noted, the largest AI hardware clusters were on the order of 1% of a human brain's scale, or about 1 trillion synapse equivalents, or parameters. At only a fraction of full human brain-scale, these clusters of graphics processors consume acres of space and megawatts of power and require dedicated teams to operate.

But Feldman said a single CS-2 accelerator -- the size of a dorm room refrigerator -- can support models of over 120 trillion parameters in size.

Four innovations

Cerebras can connect 192 CS-2s.

On top of that, he said Cerebras' new technology portfolio contains four innovations: Cerebras Weight Streaming, a new software execution architecture; Cerebras MemoryX, a memory extension technology; Cerebras SwarmX, a high-performance interconnect fabric technology; and Selectable Sparsity, a dynamic sparsity harvesting technology.

The Cerebras Weight Streaming technology can store model parameters off-chip while delivering the same training and inference performance as if they were on-chip. This new execution model disaggregates compute and parameter storage -- allowing researchers to flexibly scale size and speed independently -- and eliminates the latency and memory bandwidth issues that challenge large clusters of small processors.

This dramatically simplifies the workload distribution model and is designed so users can scale from using 1 to up to 192 CS-2s with no software changes, Feldman said.

The Cerebras MemoryX will provide the second-generation Cerebras Wafer Scale Engine (WSE-2) up to 2.4 petabytes of high-performance memory, all of which behave as if they were on-chip. With MemoryX, CS-2 can support models with up to 120 trillion parameters.

Cerebras SwarmX is a high-performance, AI-optimized communication fabric that extends the Cerebras Swarm on-chip fabric to off-chip. SwarmX is designed to enable Cerebras to connect up to 163 million AI-optimized cores across up to 192 CS-2s, working in concert to train a single neural network.

And Selectable Sparsity enables users to select the level of weight sparsity in their model and provides a direct reduction in FLOPs and time-to-solution. Weight sparsity is an exciting area of ML research that has been challenging to study, as it is extremely inefficient on graphics processing units.

Selectable sparsity enables the CS-2 to accelerate work and use every available type of sparsity -- including unstructured and dynamic weight sparsity -- to produce answers in less time.

This combination of technologies will allow users to unlock brain-scale neural networks and distribute work over enormous clusters of AI-optimized cores with push-button ease, Feldman said.

Rick Stevens, associate director at the federal Argonne National Laboratory, said in a statement that the last several years have shown that the more the parameters, the better the results for natural language processing models. Cerebras' inventions could transform the industry, he said.

Founded in 2016, Cerebras has more than 350 employees. The company will announce customers in the fourth quarter.

The Cerebras wafers

Brain-scale computing

Four innovations

More