IBM proposes AI chip with benchmark-beating power efficiency

IBM claims to have developed one of the world's first energy-efficient chips for AI inferencing and training built with 7-nanometer technology. In a paper presented at the 2021 International Solid-State Circuits Virtual Conference in early February, a team of researchers at the company detailed a hardware accelerator that supports a range of model types while achieving "leading" power efficiency on all of them.

AI accelerators are a type of specialized hardware designed to speed up AI applications, particularly neural networks, deep learning, and machine learning. They're multicore in design and focus on low-precision arithmetic or in-memory computing, both of which can boost the performance of large AI algorithms and lead to state-of-the-art results in natural language processing, computer vision, and other domains.

IBM says its four-core chip, which remains in the research stages, is optimized for low-precision workloads with a number of different AI and machine learning models. Low-precision techniques require less silicon area and power compared with their high-precision counterparts, enabling better cache usage and reduce memory bottlenecks. This often leads to a decrease in the time and energy cost of training AI models.

IBM's AI accelerator chip is among the few to incorporate ultra-low precision "hybrid FP8" formats for training deep learning models in an extreme ultraviolet lithography-based package. It's also one of the first to feature power management, with the ability to maximize performance by slowing down during computation phases with high power consumption. And it offers high sustained utilization that ostensibly translates to superior real application performance.

In experiments, IBM says its AI chip routinely achieved more than 80% utilization for training and more than 60% utilization for inference. Moreover, the chip's performance and power efficiency exceeded that of other dedicated inference and training chips.

IBM's goal in the next 2-3 years is to apply the novel AI chip design commercially to a range of applications, including large-scale training in the cloud, privacy, security, and autonomous vehicles. "Our new AI core and chip can be used for many new cloud to edge applications across multiple industries," IBM researchers Ankur Agrawal and Kailash Gopalakrishnan wrote in a blog post. "For instance, they can be used for cloud training of large-scale deep learning models in vision, speech and natural language processing using 8-bit formats (versus the 16- and 32-bit formats currently used in the industry). They can also be used for cloud inference applications, such as for speech to text AI services, text to speech AI services, natural language processing services, financial transaction fraud detection and broader deployment of AI models in financial services."