Why Transformers offer more than meets the eye

What do OpenAI's language-generating GPT-3 and DeepMind's protein shape-predicting AlphaFold have in common? Besides achieving leading results in their respective fields, both are built atop Transformer, an AI architecture that has gained considerable attention within the last several years. Dating back to 2017, Transformer has become the architecture of choice for natural language tasks, and it has demonstrated an aptitude for summarizing documents, translating between languages, and analyzing biological sequences.

Transformer has clear immediate business applications. OpenAI's GPT-3 is currently used in more than 300 apps by tens of thousands of developers, producing 4.5 billion words per day. DeepMind is applying its AlphaFold technology to identify cures for rare, neglected diseases. And more sophisticated applications are on the horizon, as demonstrated by research showing that Transformer can be tuned to play games like chess and even applied to image processing.

What are Transformers?

The Transformer architecture is made up of two core components: an encoder and a decoder. The encoder contains layers that process input data, like text and images, iteratively layer by layer. Each encoder layer generates encodings with information about which parts of the inputs are relevant to each other. They then pass these encodings to the next layer before reaching the final encoder layer.

The decoder's layers do the same thing, but to the encoder's output. They take the encodings and use their incorporated contextual information to generate an output sequence of data -- whether text, a predicted protein structure, or an image.

Each encoder and decoder layer makes use of an "attention mechanism" that distinguishes Transformer from other architectures. For every input, attention weighs the relevance of every other input and draws from them to generate the output. Each decoder layer has an additional attention mechanism that draws information from the outputs of previous decoders before the decoder layer finally draws information from the encodings to produce an output.

Transformers typically undergo semi-supervised learning that involves unsupervised pretraining, followed by supervised fine-tuning. Residing between supervised and unsupervised learning, semi-supervised learning accepts data that's partially labeled or where the majority of the data lacks labels. In this case, Transformers are first subjected to "unknown" data for which no previously defined labels exist and must teach themselves to classify the data, processing the unlabeled data to learn from its inherent structure. During the fine-tuning process, Transformers train on labeled datasets so they learn to accomplish particular tasks, like answering questions, analyzing sentiment, and paraphrasing documents.

It's a form of transfer learning, or storing knowledge gained while solving one problem and applying it to a different -- but related -- problem. The pretraining step helps the model learn general features that can be reused on the target task, boosting its accuracy.

Attention has the added benefit of boosting model training speed. Because Transformers aren't sequential, they can be more easily parallelized, and larger and larger models can be trained with significant -- but not unattainable -- increases in compute. Running on 16 Google TPUv3 special-built processors, AlphaFold took a few weeks to train, while OpenAI's music-generating Jukebox took over a month across hundreds of Nvidia V100 graphics cards.

The business value of Transformers

Transformers have been widely deployed in the real world. Viable is using the Transformer-powered GPT-3 to analyze customer feedback, identifying themes and sentiment from surveys, help desk tickets, live chat logs, reviews, and more. Algolia, another startup, is using it to improve its web search products.

More exciting use cases lie beyond the language domain. In January, OpenAI took the wraps off DALL-E, a text-to-image engine that's essentially a visual idea generator. Given a text prompt, it generates images to match the prompt, filling in the blanks when the prompt implies the image must contain a detail that isn't explicitly stated.

OpenAI predicts that DALL-E could someday augment -- or even replace -- 3D rendering engines. For example, architects could use the tool to visualize buildings, while graphic artists could apply it to software and video game design. In another point in DALL-E's favor, the Transformer-driven tool can combine disparate ideas to synthesize objects, some of which are unlikely to exist in the real world -- like a hybrid of a snail and a harp.

"DALL-E shows creativity, producing useful conceptual images for product, fashion, and interior design," Gary Grossman, global lead at Edelman's AI center of excellence, wrote in a recent blog post. "DALL-E could support creative brainstorming ... either with thought starters or, one day, producing final conceptual images. Time will tell whether this will replace people performing these tasks or simply be another tool to boost efficiency and creativity."

We will eventually see Transformer-based models that can go one step further, synthesizing not just pictures but videos from whole cloth. These types of systems have been detailed in academic literature. Other, related applications may soon -- or already -- include generating realistic voices, recognizing speech, parsing medical records, predicting stock prices, and creating computer code.

Indeed, Transformers have immense potential in the enterprise, which is one of the reasons the global AI market is anticipated to be worth $266.92 billion by 2027. Transformer-powered apps could enable workers to spend their time on less menial, more meaningful work, bolstering productivity. The McKinsey Global Institute predicts technology like Transformers will result in a 1.2% increase in gross domestic product growth (GDP) for the next 10 years and help capture an additional 20% to 25% in net economic benefits -- $13 trillion globally -- in the next 12 years.

Businesses that ignore the potential of Transformers do so at their peril.