IBM's 8-bit AI training technique is up to 4 times faster while retaining accuracy

Computational efficiency is the name of the game in artificial intelligence (AI). It's not easy maintaining a balance between training speed, accuracy, and energy consumption, but recent hardware advances have made the goal more attainable than it once was. Case in point: IBM will this week showcase AI training methods that result in orders of magnitude better performance than the previous state of the art.

The first of the Armonk, New York company's breakthroughs is an accelerated digital technique that achieves full accuracy with 8-bit precision. The second is an 8-bit precision technique for an analog chip -- the highest of its kind to date, IBM claims -- that roughly doubles accuracy.

Both were detailed today in Montreal at NeurIPS 2018, one of the world's largest AI and machine learning conferences.

"The coming generation of AI applications will need faster response times, bigger AI workloads, and multimodal data from numerous streams. To unleash the full potential of AI, we are redesigning hardware with AI in mind: from accelerators to purpose-built hardware for AI workloads, like our new chips, and eventually quantum computing for AI," Jeffrey Welser, vice president and lab director at IBM Research-Almaden, wrote in a blog post. "Scaling AI with new hardware solutions is part of a wider effort at IBM Research to move from narrow AI, often used to solve specific, well-defined tasks, to broad AI, which reaches across disciplines to help humans solve our most pressing problems."

Moving from relatively high precision (16 bit) floating point arithmetic to low precision (8 bit) FP might sound counterintuitive, but tasks like speech recognition and language translation aren't necessarily that exacting. Making do with approximations opens the door to significant power efficiency and performance gains. As Welser explains, the "computational building blocks" with 16-bit precision engines are on average 4 times smaller than comparable blocks with 32-bit precision.

In a paper titled "Training Deep Neural Networks with 8-bit Floating Point Numbers," IBM researchers describe how they were able to both reduce the arithmetic precision for additions from 32 bits to 16 bits and preserve accuracy at 8-bit precision across models like ResNet50, AlexNet, and BN50_DNN, as well as a range of image, speech, and text datasets. They claim their technique accelerates training time for deep neural networks by 2 to 4 times over 16-bit systems.

A second paper -- "8-bit Precision In-Memory Multiplication with Projected Phase-Change Memory" -- lays bare a method that compensates for the low intrinsic precision of analog AI chips, enabling them to hit 8-bit precision in scalar multiplication operation and roughly double accuracy while consuming 33 times less energy than comparable digital AI systems.

The paper's authors propose in-memory computing as an alternative to traditional memory, which performs the dual role of storing data and processing it. That architectural tweak alone can reduce energy usage by 90 percent or more, and additional performance gains come from phase-change memory (PCM), which has a conductance that can be modified with electrical pulses. This property enables it to perform calculations, and researchers' projected PCM (Proj-PCM) renders PCM largely immune to variations in conductance, allowing it to achieve much higher precision than previously possible.

"The improved precision achieved by our research team indicates in-memory computing may be able to achieve high-performance deep learning in low-power environments, such as IoT and edge applications," Welser wrote. "As with our digital accelerators, our analog chips are designed to scale for AI training and inferencing across visual, speech, and text datasets and extend to emerging broad AI."

More