Check out the on-demand sessions from the Low-Code/No-Code Summit to learn how to successfully innovate and achieve efficiency by upskilling and scaling citizen developers. Watch now.


Recent years have seen a growing demand for artificial intelligence (AI) acceleration hardware. IBM has taken note.

In the earliest days of AI, commercial CPU and GPU technologies were enough to handle the technology’s data sizes and computational parameters. But with the emergence of larger datasets and deep learning models, there is now a clear need for purpose-built AI hardware acceleration.

IBM is now throwing its hat into the hardware acceleration ring with its announcement this week of the IBM Artificial Intelligence Unit (AIU). The AIU is a full system-on-chip board that can plug into servers via an industry-standard PCIe interface.

The IBM Artificial Intelligence Unit is a full system-on-chip AI accelerator card that will plug into industry standard PCIe slots. Credit: IBM Research

The AIU is based on the same AI core that is built into IBM’s Tellum chip, which powers the IBM z16 series mainframes, including the LinuxOne Emperor 4. Each AIU has 32 cores, developed with a 5nm (nanometer) process, while the AI cores on the Tellum processor are 7nm.

Event

Intelligent Security Summit

Learn the critical role of AI & ML in cybersecurity and industry specific case studies on December 8. Register for your free pass today.

Register Now

“Here at IBM Research, we have a very strong microarchitecture and circuit design team that has focused on high-performance designs primarily for HPC [high-performance computing] and servers for many decades,” Jeff Burns, director at IBM Research AI Hardware Center, told VentureBeat. “So the same folks started thinking about deep learning acceleration.”

Accelerating AI with IBM’s Artificial Intelligence Unit (AIU)

The basic ideas behind IBM’s AI accelerator started to be developed in 2017 and have been expanded upon in the years since.

The AI acceleration work was picked up by the IBM Systems Group, which integrated the technology into processors running in mainframes. Burns said that his team also wanted to design a complete system-on-chip along with a PCIe card to create a pluggable AI acceleration device that could go into IBM’s x86-based cloud, IBM’s Power-based enterprise servers, or servers that IBM’s partners might build.

The AIU is not a CPU or a GPU, but rather an application-specific integrated circuit (ASIC). Burns explained that instead of taking GPU technology and optimizing it for AI, the IBM approach has been designed from the ground up for AI. As such, the AIU has certain capabilities that aren’t always part of common AI accelerators. One is the ability to virtualize the AI acceleration services the AIU can enable.

In any cloud or enterprise environment, there are multiple workloads running on hardware that need access to AI acceleration resources. The way that an operator is able to distribute access to the hardware is with virtualization.

“GPUs were not virtualized at all in the beginning, and that is one place where a legacy design and a new design end up being starkly different,” Burns said. “If you take something that was not designed to be virtualized, and then try to modify it to support virtualization well, that can be a long journey and quite difficult.”

Burns explained that the IBM AIU has been designed to support enterprise virtualization, which includes the ability to be reliably multi-user and multi-tenant to ensure that workloads are isolated from one another.

IBM has also designed the AIU to be as compatible as possible with the vast majority of the software stack that modern data scientists use, including common tools such as the open-source PyTorch and TensorFlow technologies.

Approximate computing is the secret sauce of the IBM AIU

A key innovation that IBM is integrating with its AIU is a technique for AI acceleration called approximate computing, to help improve performance.

“Approximate computing is really the recognition that AI is not 100% correct,” Leland Chang, principal research staff member and senior manager, AI hardware, at IBM Research, told VentureBeat.

Chang explained that AI often works by recognizing a pattern and could well be just 99% accurate, meaning that 1% of results are incorrect. The concept of approximate computing is the recognition that within the AI algorithm it is possible to cut some corners. While Chang admitted that this can reduce precision, he explained that if information is lost in the right places, it doesn’t affect the result — which, more often than not, will still be 99% correct.

“Approximate computing, to some extent, is a phrase and a nice-sounding name, but it is simply recognizing that it doesn’t have to be 100% exact,” Chang said. “You’re losing some information, but you’re losing in places where it doesn’t matter.”

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.