OpenAI releases Triton, a programming language for AI workload optimization

Let the OSS Enterprise newsletter guide your open source journey! Sign up here.

OpenAI today released Triton, an open source, Python-like programming language that enables researchers to write highly efficient GPU code for AI workloads. Triton makes it possible to reach peak hardware performance with relatively little effort, OpenAI claims, producing code on par with what an expert could achieve in as few as 25 lines.

Deep neural networks have emerged as an important type of AI model, capable of achieving state-of-the-art performance across natural language processing, computer vision, and other domains. The strength of these models lies in their hierarchical structure, which generates a large amount of highly parallelizable work well-suited for multicore hardware like GPUs. Frameworks for general-purpose GPU computing such as CUDA and OpenCL have made the development of high-performance programs easier in recent years. Yet GPUs remain especially challenging to optimize, in part because their architectures rapidly evolve.

Domain-specific languages and compilers have emerged to address the problem, but these systems tend to be less flexible and slower than the best handwritten compute kernels available in libraries like cuBLAS, cuDNN, or TensorRT. Reasoning about all these factors can be challenging even for seasoned programmers. The purpose of Triton, then, is to automate these optimizations, so that developers can focus on the high-level logic of their code.

"Novel research ideas in the field of deep learning are generally implemented using a combination of native framework operators ... [W]riting specialized GPU kernels [can improve performance,] but [is often] surprisingly difficult due to the many intricacies of GPU programming. And although a variety of systems have recently emerged to make this process easier, we have found them to be either too verbose, lack flexibility, [or] generate code noticeably slower than our hand-tuned baselines," Philippe Tillet, Triton's original creator, who now works at OpenAI as a member of the technical staff, wrote in a blog post. "Our researchers have already used [Triton] to produce kernels that are up to 2 times more efficient than equivalent Torch implementations, and we're excited to work with the community to make GPU programming more accessible to everyone."

Simplifying code

According to OpenAI, Triton -- which has its origins in a 2019 paper submitted to the International Workshop on Machine Learning and Programming Languages -- simplifies the development of specialized kernels that can be much faster than those in general-purpose libraries. Its compiler simplifies code and automatically optimizes and parallelizes it, converting it into code for execution on recent Nvidia GPUs. (CPUs and AMD GPUs and platforms other than Linux aren't currently supported.)

"The main challenge posed by our proposed paradigm is that of work scheduling -- i.e., how the work done by each program instance should be partitioned for efficient execution on modern GPUs," Tillet explains on Triton's documentation website. "To address this issue, the Triton compiler makes heavy use of block-level data-flow analysis, a technique for scheduling iteration blocks statically based on the control- and data-flow structure of the target program. The resulting system actually works surprisingly well: our compiler manages to apply a broad range of interesting optimization automatically."

The first stable version of Triton, along with tutorials, is available from the project's GitHub repository.

Simplifying code

More