Why the world's biggest supercomputer will use graphics chips (video)

Steve Scott, chief technology officer for Tesla graphics chip products at Nvidia, was feeling pretty good this week after the Oak Ridge National Laboratory announced it will build the world's fastest supercomputer using 18,000 high-end graphics chips from Nvidia.

The new kind of supercomputer means that graphics chips have finally become as important as the microprocessor in tackling the world's most difficult computing problems. That should cause a sea change in revenues for the chip industry.

The machine will be built by Cray and will use 18,000 Advanced Micro Devices microprocessors, or central processing units (CPUs). Once upon a time, it made sense to build supercomputers with CPUs alone. But CPU-only processing is now bound by power constraints. CPUs process data in serial fashion, one after another. That's fine for smaller problems. But supercomputers have to run thousands of similar operations in parallel. That kind of processing is best suited for graphics chips, which have as many as 512 cores, or processing elements, on a single chip. The graphics chips (GPUs, or graphics processing units) are good for massively parallel computation.

Scott said in an interview at Nvidia's headquarters in Santa Clara, Calif., that adding one graphics chip to a supercomputer can eliminate the need for five to eight CPUs. That saves a lot of cost and power at the same time. This kind of sea change in supercomputing is why Scott left Cray, where he was chief technology officer for six years, to join graphics chip maker Nvidia.

"Energy is the new constraint," Scott said. "If you pack too many transistors on a chip and run them as fast as possible, the chip will melt. Over time, this problem is getting exponentially worse."

You can bet that more supercomputers will use this kind of solution. Every major government has the goal of doing exascale computing, or executing operations at an exaflop.

Scott is one of those people who can tell you what an exaflop is. It's a billion billion floating point operations per second. A flop is the equivalent of taking two 15-digit numbers and multiplying them together. Right now, the Oak Ridge supercomputer is expected to be able to compute at 20 petaflops. A thousand petaflops is equal to one exaflop.

Oak Ridge needs this kind of computing power because it will take exascale computing to do climate change simulations right. Scientists believe they need to be able to model climate effects and calculations on a scale of one kilometer in order to simulate the climate for the entire earth accurately. You just can't do that kind of calculation with today's supercomputers.

Oak Ridge will also need the computing power to study nuclear energy and how to make it safer as well as to do research on new biofuels and more efficient internal combustion engines, Scott said. And putting graphics chips in supercomputers won't be unusual at some point.

"In five to ten years, this will just be the way you build computers," Scott said.

Nvidia's Tesla graphics chips are now in three of the top five supercomputers. That kind of penetration has to make Intel nervous. Right now, Intel is selling only CPUs to the supercomputer makers. But it is working on massively parallel microprocessor chips that emulate more of the graphics chip functionality.

It's a problem for Intel because Nvidia is cutting into sales of Intel's highest value chips, which are CPUs for servers and supercomputers. Intel can keep its average selling prices constant, and therefore its profits constant, because it sells high-end CPUs for high prices, making up for the low-end CPUs that it sells at low prices.

By selling high-value graphics chips, Nvidia can create the same kind of business as Intel, and steal market share from it. That's enough to make Intel nervous.

Interestingly, gamers are making this all possible. Nvidia can afford to design and sell its $2,000 Tesla GPUs because it can bring the costs for manufacturing these chips down. It does so by selling millions of graphics chips for game machines and ordinary PCs.

Over time, the ordinary graphics chips will become capable of processing at speeds that only supercomputers can do today. Hopefully, we'll all have the equivalent of Titan in our home computers.

Check out our video interview with Scott below.

More