Check out the on-demand sessions from the Low-Code/No-Code Summit to learn how to successfully innovate and achieve efficiency by upskilling and scaling citizen developers. Watch now.
Without inference, an artificial intelligence (AI) model is just math and does not actually execute or forecast much, if anything.
To date, AI inference engines have been largely tethered to specific hardware for which they are designed. That degree of hardware lock-in means that developers will need to build specific software for different hardware, and could well also slow the pace of industry innovation overall.
The challenge of managing inference hardware has not been lost on social media giant Meta (formerly Facebook). Meta uses a lot of different hardware across its infrastructure and has its fair share of challenges implementing inference solutions. To help solve that challenge, Meta has been working on a technology it calls AITemplate (AIT) which it defines as a unified inference system that initially will support both Nvidia TensorCore and AMD MatrixCore inference hardware. Meta announced yesterday that it is open sourcing AITemplate under an Apache 2.0 license.
“Our current version of AIT is focused on support for Nvidia and AMD GPUs, but the platform is scalable and could support Intel GPUs in [the] future if demand was there,” Ajit Mathews, director of engineering at Meta, told VentureBeat. “Now that we have open-sourced AIT, we welcome any silicon providers interested to contribute to it.”
Intelligent Security Summit
Learn the critical role of AI & ML in cybersecurity and industry specific case studies on December 8. Register for your free pass today.
The need for GPU and inference engine abstraction
The idea of lock-in for AI hardware is not limited to just inference engines; it’s also a concern that others in the industry, including Intel, also have about GPUs for accelerated computing.
Intel is among the leading backers of the open-source SYCL specification, which seeks to help create a unified programming layer for GPUs. The Meta-led AIT effort is similar in concept, though different in what it enables. Mathews explained that SYCL is closer to the GPU programming level, while AITemplate is focusing on high-performance TensorCore/MatrixCore AI primitives.
“AIT is an alternative to TensorRT which is the Inference engine from Nvidia,” Mathews said. “Unlike TensorRT, it is an open-source solution which supports both Nvidia and AMD GPU backends.”
Mathews noted that AIT first characterizes the model architecture, and then works on fusing and optimizing layers and operations specific to that architecture.
It’s not about competition
AIT isn’t just about creating a common software layer for inference, it’s also about performance. In early tests conducted by Meta, it is already seeing performance improvements over non-AIT inference-powered models on both Nvidia and AMD GPUs.
“For AIT the goal is to bring flexible, open, more energy-efficient AI inference for GPU users,” Mathews said.
Meta isn’t just building AIT to serve the greater good, but to also meet its own AI needs. Mathews said that Meta’s workloads are evolving and in order to meet these changing needs, it needs solutions that are open and performant. He also noted that Meta tends to want the upper layers of its technology stacks to be hardware-agnostic. AIT does that today with AMD and Nvidia GPUs.
“We see opportunities with many of our current and future Inference workloads to benefit from AIT,” he said. “We think AIT has the potential for broad adoption as the most performant unified inference engine.”
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.