Meta seeks to accelerate AI inference with open-source AITemplate

Without inference, an artificial intelligence (AI) model is just math and does not actually execute or forecast much, if anything.

To date, AI inference engines have been largely tethered to specific hardware for which they are designed. That degree of hardware lock-in means that developers will need to build specific software for different hardware, and could well also slow the pace of industry innovation overall.

The challenge of managing inference hardware has not been lost on social media giant Meta (formerly Facebook). Meta uses a lot of different hardware across its infrastructure and has its fair share of challenges implementing inference solutions. To help solve that challenge, Meta has been working on a technology it calls AITemplate (AIT) which it defines as a unified inference system that initially will support both Nvidia TensorCore and AMD MatrixCore inference hardware. Meta announced yesterday that it is open sourcing AITemplate under an Apache 2.0 license.

"Our current version of AIT is focused on support for Nvidia and AMD GPUs, but the platform is scalable and could support Intel GPUs in [the] future if demand was there," Ajit Mathews, director of engineering at Meta, told VentureBeat. "Now that we have open-sourced AIT, we welcome any silicon providers interested to contribute to it."

The need for GPU and inference engine abstraction

The idea of lock-in for AI hardware is not limited to just inference engines; it's also a concern that others in the industry, including Intel, also have about GPUs for accelerated computing.

Intel is among the leading backers of the open-source SYCL specification, which seeks to help create a unified programming layer for GPUs. The Meta-led AIT effort is similar in concept, though different in what it enables. Mathews explained that SYCL is closer to the GPU programming level, while AITemplate is focusing on high-performance TensorCore/MatrixCore AI primitives.

"AIT is an alternative to TensorRT which is the Inference engine from Nvidia," Mathews said. "Unlike TensorRT, it is an open-source solution which supports both Nvidia and AMD GPU backends."

Mathews noted that AIT first characterizes the model architecture, and then works on fusing and optimizing layers and operations specific to that architecture.

It's not about competition

AIT isn't just about creating a common software layer for inference, it's also about performance. In early tests conducted by Meta, it is already seeing performance improvements over non-AIT inference-powered models on both Nvidia and AMD GPUs.

"For AIT the goal is to bring flexible, open, more energy-efficient AI inference for GPU users," Mathews said.

Meta isn't just building AIT to serve the greater good, but to also meet its own AI needs. Mathews said that Meta's workloads are evolving and in order to meet these changing needs, it needs solutions that are open and performant. He also noted that Meta tends to want the upper layers of its technology stacks to be hardware-agnostic. AIT does that today with AMD and Nvidia GPUs.

"We see opportunities with many of our current and future Inference workloads to benefit from AIT," he said. "We think AIT has the potential for broad adoption as the most performant unified inference engine."

The need for GPU and inference engine abstraction

It's not about competition

More