Cerebras' wafer-size chip is 10,000 times faster than a GPU

Cerebras Systems and the federal Department of Energy's National Energy Technology Laboratory today announced that the company's CS-1 system is more than 10,000 times faster than a graphics processing unit (GPU).

On a practical level, this means AI neural networks that previously took months to train can now train in minutes on the Cerebras system.

Cerebras makes the world's largest computer chip, the WSE. Chipmakers normally slice a wafer from a 12-inch-diameter ingot of silicon to process in a chip factory. Once processed, the wafer is sliced into hundreds of separate chips that can be used in electronic hardware.

But Cerebras, started by SeaMicro founder Andrew Feldman, takes that wafer and makes a single, massive chip out of it. Each piece of the chip, dubbed a core, is interconnected in a sophisticated way to other cores. The interconnections are designed to keep all the cores functioning at high speeds so the transistors can work together as one.

Cerebras's CS-1 system uses the WSE wafer-size chip, which has 1.2 trillion transistors, the basic on-off electronic switches that are the building blocks of silicon chips. Intel's first 4004 processor in 1971 had 2,300 transistors, and the Nvidia A100 80GB chip, announced yesterday, has 54 billion transistors.

Feldman said in an interview with VentureBeat that the CS-1 was also 200 times faster than the Joule Supercomputer, which is No. 82 on a list of the top 500 supercomputers in the world.

"It shows record-shattering performance," Feldman said. "It also shows that wafer scale technology has applications beyond AI."

These are fruits of the radical approach Los Altos, California-based Cerebras has taken, creating a silicon wafer with 400,000 AI cores on it instead of slicing that wafer into individual chips. The unusual design makes it a lot easier to accomplish tasks because the processor and memory are closer to each other and have lots of bandwidth to connect them, Feldman said. The question of how widely applicable the approach is to different computing tasks remains.

A paper based on the results of Cerebras' work with the federal lab said the CS-1 can deliver performance that is unattainable with any number of central processing units (CPUs) and GPUs, which are both commonly used in supercomputers. (Nvidia's GPUs are used in 70% of the top supercomputers now). Feldman added that this is true "no matter how large that supercomputer is."

Cerebras is presenting at the SC20 supercomputing online event this week. The CS-1 beat the Joule Supercomputer at a workload for computational fluid dynamics, which simulates the movement of fluids in places such as a carburetor. The Joule Supercomputer costs tens of millions of dollars to build, with 84,000 CPU cores spread over dozens of racks, and it consumes 450 kilowatts of power.

In this demo, the Joule Supercomputer used 16,384 cores, and the Cerebras computer was 200 times faster, according to energy lab director Brian Anderson. Cerebras costs several million dollars and uses 20 kilowatts of power.

"For these workloads, the wafer-scale CS-1 is the fastest machine ever built," Feldman said. "And it is faster than any other combination or cluster of other processors."

A single Cerebras CS-1 is 26 inches tall, fits in one-third of a rack, and is powered by the industry's only wafer-scale processing engine, Cerebras' WSE. It combines memory performance with massive bandwidth, low latency interprocessor communication, and an architecture optimized for high bandwidth computing.

The research was led by Dirk Van Essendelft, machine learning and data science engineer at NETL, and Michael James, Cerebras cofounder and chief architect of advanced technologies. The results came after months of work.

In September 2019, the Department of Energy announced its partnership with Cerebras, including deployments with Argonne National Laboratory and Lawrence Livermore National Laboratory.

The Cerebras CS-1 was announced in November 2019. The CS-1 is built around the WSE, which is 56 times larger, has 54 times more cores, 450 times more on-chip memory, 5,788 times more memory bandwidth, and 20,833 times more fabric bandwidth than the leading GPU competitor, Cerebras said.

Depending on workload, from AI to HPC, the CS-1 delivers hundreds or thousands of times more compute than legacy alternatives, and it does so at a fraction of the power draw and space.

Feldman noted that the CS-1 can finish calculations faster than real time, meaning it can start the simulation of a power plant's reaction core when the reaction starts and finish the simulation before the reaction ends.

"These dynamic modeling problems have an interesting characteristic," Feldman said. "They scale poorly across CPU and GPU cores. In the language of the computational scientist, they do not exhibit 'strong scaling.' This means that beyond a certain point, adding more processors to a supercomputer does not yield additional performance gains."

Cerebras has raised $450 million and has 275 employees.

More