In a paper published this week on the preprint server Arxiv.org, scientists at Google, DeepMind, the Alan Turing Institute, and the University of Cambridge propose Performer, an AI model architecture that scales linearly and performs well on tasks like protein sequence modeling. They claim that it has the potential to impact research on biological sequence analysis while lowering compute costs and compute complexity, at the same time reducing energy consumption and subsequently carbon emissions.

Performer is an offshoot of Transformer, an architecture proposed by Google researchers in 2017. Transformers rely on a trainable attention mechanism that specifies dependencies between elements of each input sequence (for instance, amino acids within a protein). It’s this that enables them to achieve state-of-the-art results in areas of machine learning including natural language processing, neural machine translation, document generation and summarization, and image and music generation. But Transformers scale quadratically with the number of tokens — i.e., sequence of characters — in an input sequence, which is prohibitively expensive for large tokens.

By contrast, Performers scale linearly by the number of tokens in an input sequence. Their backbone is fast attention via orthogonal random features (FAVOR), a technique that maintains marginal distributions of inputs while recognizing that different inputs are statistically independent. This lets Performers handle long sequences and remain backward-compatible with pretrained regular Transformers, allowing them to be used beyond the scope of Transformers as a more scalable replacement for attention in computer vision, reinforcement learning, and other AI applications.

To evaluate the architecture, the researchers implemented Performer on top of pre-existing Transformer training code designed to model protein interactions. (Performer replaced only the attention component, while all other components remained the same.) Both Performer- and Transformer-based baseline models were fed concatenated protein sequences 8,192 tokens in length from the open source database Trembl, and they were trained on Google-designed third-generation tensor processing units (TPUs) containing 16GB of RAM per chip.

The researchers report that the Transformer-based models overloaded the chips’ memory even at a batch size of 1 per chip. On the other hand, Performer trained efficiently at a batch size of 16 per chip while self-improving continuously, increasing its performance as training progressed.

The results show Performer could benefit modern bioinformatics “immensely” by scaling up methods to train faster, more accurate AI models, the coauthors say. “[This] opens the door to the ability to design sets of molecules with pre-specified interaction properties. These approaches could be used to augment existing physics-based design strategies that are of critical importance for example in the development of new nanoparticle vaccines,” they wrote.

Notably, Performer follows the introduction of Reformer, an evolution of Transformer that Google designed to handle context windows of up to 1 million words. By leveraging techniques like locality-sensitive hashing (LSH) and reversible residual layers to use memory efficiently and reduce complexity over long sequences, it’s able to run on a single AI accelerator chip using only 16GB of memory.

For its part, OpenAI recently debuted Sparse Transformers, an open source machine learning system that can predict what comes next in text, image, and sound sequences 30 times longer than what’s possible with Transformers. Sparse Transformers form the foundation of JukeBox, a machine learning framework that generates music — including rudimentary songs — as raw audio in a range of genres and musical styles.


How startups are scaling communication: The pandemic is making startups take a close look at ramping up their communication solutions. Learn how