IBM researchers design a fast, power-efficient chip for AI training

Thanks to powerful graphics chips and advances in distributed computing, optimizing the algorithms at the core of artificial intelligence is easier than ever before. But it's not particularly efficient on current-day hardware -- even powerful GPUs can take days or weeks to train a neural network.

That catalyzed researchers at IBM to develop a new chip tailor-made for AI training. In a paper published in the journal Nature titled "Equivalent-accuracy accelerated neural-network training using analog memory," they describe a system of transistors and capacitors that can train neural networks quickly, precisely, and highly energy-efficiently.

Neural networks consist of interconnected units called neurons or nodes (a collection of nodes is called a layer), which receive numerical inputs. In a basic network, individual neurons multiply those inputs by a value -- a weight -- and pass them along to an activation function, which defines the output of the node. Through a strategy known as backpropagation, the weights are adjusted over time, improving the accuracy of the outputs.

GPUs are well-suited for these because unlike traditional processor, which crunch through numbers sequentially, they're able to perform lots of computations in parallel. But because the processor and memory in graphics chips sit a considerable distance apart from one another on the motherboard, delays are introduced as data shuttles back and forth between them.

"Conventional computers [consume] consume an enormous amount of energy," Stefano Ambrogio, a postdoctoral researcher at IBM who led the project, told VentureBeat in an interview, "and there's a lot of waiting involved."

The scientists' solution consists of analog memory and traditional electronic components. Individual cells made up of a pair of phase change memory (PCM) units and a combination of a capacitor and three transistors correspond to individual neurons in the network. The PCMs store weight data in memory, which is represented in the transistors and capacitors as an electrical charge.

As the network trains, the capacitor updates the weights, transferring them to the PCM after thousands of cycles.

The capacitor can't retain values for more than a few milliseconds, but it can be programmed quickly. And the PCM, which is a form of non-volatile memory, doesn't need an external power source to retain data.

The researches used a mix of hardware PCMs and software-simulated components to benchmark the design, and the results are promising. The chip performed 100 times more calculations per square millimeter than a GPU while using 280 times less power. Even more impressive, it matched the speed and accuracy of Google's TensorFlow machine learning framework on a variety of computer vision tasks.

"We can do [the calculations] in a very accurate way, at the same accuracy as software," Ambrogio said.

The researchers' chip design isn't without a significant caveat: It's not optimized for neural networks that aren't fully connected, such as the long short term memory (LSTM) networks used in cutting-edge speech recognition apps. But the researchers plan to tackle that next.

Ambrogio is confident they'll be able to build physical chips at scale in the future. He sees them being used for training neural networks in smartphones and other devices that currently lack the necessary computing resources.

"It would be nice to be able to directly process AI where it's needed," Ambrogio said. "When you're able to train a model, you don't need to send the information [to the cloud] or have [the device] communicate with something else, and it can react instantly to something."

More