This article is part of a VB Lab Insights series on AI sponsored by Microsoft and Nvidia.
Don’t miss additional articles in this series providing new industry insights, trends and analysis on how AI is transforming organizations. Find them all here.
In July 2022, the AI world and popular press worldwide buzzed with the release of DALL-E 2, the generative AI with 3.5 billion parameters developed by Open AI. Then came ChatGPT.
Until then, flashy text-to-image models had grabbed much of the media and industry attention. But the December public introduction of the new interactive conversational chatbot (also developed and trained by OpenAI) brought another type of Large Language Model (LLM) into the spotlight.
Versatile LLMs are expanding fast
LLMs are learning algorithms that can recognize, summarize, translate, predict and generate languages using very large text-based datasets, with little or no training supervision. They handle diverse tasks such as answering customer questions or recognizing and generating text, sounds, and images with high accuracy. Besides text-to-image, a growing range of other modalities includes text-to-text, text-to-3D, text-to-video, digital biology, and more.
Over the last two years, LLM neural networks have been quietly expanding AI’s impact in healthcare, gaming, finance, robotics, and other fields and functions, including enterprise development of software and machine learning. “Large language models have proven to be flexible and capable, able to answer deep domain questions, translate languages, comprehend and summarize documents, write stories and compute programs,” says Bryan Catanzaro, vice president of Applied Deep Learning Research at Nvidia.
The arrival of ChatGPT marked the clear coming out of a different kind of LLM as the foundation of generative AI and transformer neural networks (GPT stands for generative pre-trained transformer). They’re increasingly heralded as a revolutionary disrupter of AI, including enterprise applications.
“AI-first” infrastructures enable enterprise-grade LLMs
Originating in an influential research paper from 2017, the idea took off a year later with the release of BERT (Bidirectional Encoder Representations from Transformer) open-source software and OpenAI’s GPT-3 model. As these pre-trained models have grown in complexity and size — 10x annually recently — so have their capabilities and popularity. Today, the world’s largest models PaLM 540B, and Megatron 530B, are LLMs. ChatGPT is built on Open AI’s GPT 3-5, introduced in late November 2022.
As one of the newest and most powerful classes of models, LLMs are increasingly displacing convolutional and recurrent networks. A key advancement has been combining specialized AI hardware, scalable-friendly architectures, frameworks, customizable models and automation with robust “AI-first” infrastructures. That’s making it feasible to deploy and scale production-ready LLMs within a wide range of mainstream commercial and enterprise-grade applications on public and private clouds and via APIs.
LLMs can help enterprises codify intelligence through learned knowledge across multiple domains, says Catanzaro. Doing so helps speed innovation that expands and unlocks the value of AI in ways previously available only on supercomputers.
Compelling new examples abound. Tabnine, for example, has created an AI assistant for software developers that runs multiple LLMs. The Tel Aviv-based company says it helps more than a million developers worldwide, program faster in 20 software languages and in 15 editors, thanks to whole-line and full-function completions that automate up to 30% of code.
Tokyo-based Rinna employs LLMs to create chatbots used by millions in Japan, as well as tools to let developers build custom bots and AI-powered characters.
One of the best-known, most established examples is Microsoft Translator. The Azure-based service, with billions of parameters, came into the spotlight helping disaster workers a decade ago understand Haitian Creole while responding to a 7.0 earthquake. The free personal translation app continues to evolve, and now supports text, voice, conversations, camera photos and screenshots in more than 70 languages.
How LLMs work
Transformer models apply an evolving set of mathematical techniques, called attention or self-attention, to detect subtle ways even distant data elements in a series influence each other.
How large models work in practice is straightforward. A typical example: Text generation and decoding is handled by GPT-3, an autoregressive language model that uses deep learning to produce human-like text. CODEX, a descendant of the GPT-3, does coding, adds comments and rewrites code for efficiency, among other tasks. The new NLLB-200 model handles the translation of more than 200 languages.
Much of the rapid progress over the last five years has been driven by the desire to create bigger and more powerful networks with less effort.
Overcoming power and scaling challenges with new targeted, technologies
Despite rapid, impressive advances in technology, size and performance, LLMs and sophisticated natural language models have been hard to develop, train, deploy and maintain, making them impractical or inaccessible to most enterprises.
Several challenges quickly arise when creating large models from scratch or customizing and fine-tuning them for a specific use case. Most critically, processing a huge collection of free-form text documents requires significant time and computing power, usually GPUs.
Explains Dave Salvator, director, Accelerated Computing at Nvidia: “What’s needed is computational power at scale to train and deploy LLMs. Performance directly impacts the total costs for training LLMs and the costs of deploying an LLM-powered product or service into production. Purpose-built software is also essential for distributed training and inference of these models using multiple GPUs across multiple nodes in a cluster. And because models and user demand vary in size, complexity, and intensity, flexibility to scale up or down is another key element.”
The latter is especially crucial. Commercial adoption of LLMs depends on a highly scalable infrastructure, along with the computing horsepower to deliver results in real-time and an efficient inference-serving solution. An ongoing partnership between Microsoft and Nvidia is working to help enterprises meet these daunting demands. The industry giants are collaborating on products and integrations for training and deploying LLMs with billions and trillions of parameters. A key is more tightly coupling the containerized Nvidia NeMo Megatron framework and a host of other targeted products with Microsoft Azure AI Infrastructure, which can deliver a scaling efficiency of 95% on 1400 GPUs.
LLMs speed innovation in AI development and life sciences
As Tabnine found, speeding the development of software and AI applications is emerging as a high-value use case. Today’s generative AI technologies augment efforts by software engineers to optimize for productivity and accuracy.
NLP Cloud is an advanced software service that helps organizations fine-tune and deploy AI models; its LLMs enable easy text understanding and generation and entity extraction without DevOps.
While LLMs have helped AI understand human language, they’re not limited to it. New developments are making it easier to train massive neural networks on biomolecular data and chemical data. The ability to understand these “languages” lets researchers develop and deploy AI that can discover new patterns and insights in biological sequences and human health conditions. Thanks to these capabilities, top biotech and pharma companies have adopted Nvidia’s forthcoming BioNeMo service to accelerate drug discovery research.
“With the ever-widening adoption of large language models in the protein space, the ability to efficiently train LLMs and quickly modulate model architectures is becoming hugely important,” explains Istvan Redl, machine learning lead at Peptone, a biotech startup in the Nvidia Inception program. “We believe that these two engineering aspects — scalability and rapid experimentation — are exactly what the BioNeMo framework could provide.”
Research from the Rostlab at Technical University of Munich, and work by a team from Harvard, Yale and New York University and others are also helping scientists understand proteins, DNA/RNA and generate de novo chemical structures.
What’s next for Large Language Models and “Transformer AI”?
The creation of specialized frameworks, servers, software and tools has made LLM more feasible and within reach, propelling new use cases. New advances are already driving a wave of innovation in AI and machine learning. The much-anticipated release of GPT-4 will likely deepen the growing belief that “Transformer AI” represents a major advancement that will radically change how AI systems are trained and built.
For enterprises, LLMs offer the promise of boosting AI adoption hindered by a shortage of workers to build models. With just a few hundred prompts, foundational LLMs can be easily leveraged by organizations without AI expertise — a huge plus.
Many analysts predict LLM technology and the industry will continue to mature and grow rapidly over the next decade. The last year has seen a slew of new large-scale models, including Megatron-Turing NLG, a 530-billion-parameter LLM released by Microsoft and Nvidia. The model is used internally for a wide variety of applications, to reduce risk and identify fraudulent behavior, reduce customer complaints, increase automation and analyze customer sentiment.
Ongoing research and commercialization are predicted to spawn all sorts of new models and applications in computational photography, education, and interactive experiences for mobile users. One running industry tally of startups includes more than 150 in generative AI alone.
“Customers continuously automate their text generation on gigantic GPT-3 models with an unmatched range of application, accuracy and latency,” says Hugo Affaticati, technical program manager on AI & HPC benchmarking at Microsoft. “NeMo Megatron, combined with Azure’s infrastructure offers the scalability, adaptability, and great potential needed to solve always-evolving problems.” Affaticati believes “the future of LLMs has never been brighter,” noting that “Microsoft is committed to bringing the latest offerings to the cloud, such as the latest GPUs or models with trillions of parameters.”
Robotic control is an especially promising frontier. Researchers now use transformer-based models to teach robots used in manufacturing, construction, autonomous driving and personal assistants. Some believe that powerful LLMs will continue to replace traditional convolutional AI models. A good example is TimeSformer, designed by researchers at Meta AI and Dartmouth, which uses transformers to analyze video.
Indeed, the “foundational models” of Transformer AI represent a potentially huge paradigm shift for AI. Unlike most of today’s LLMs, built and maintained for specific tasks, a single foundational model can be engineered to address a wide variety of tasks. Stanford University, for example, recently created a new center to explore the implications.
“The sheer scale and scope of foundation models over the last few years have stretched our imagination of what is possible,” Stanford researchers recently wrote, and promise “a wide range of beneficial applications for society.”
For enterprises, the practical value is certain to extend far beyond generating “artistic” images of Darth Vader ice fishing.
#MakeAIYourReality #AzureHPCAI #NVIDIAonAzure
VB Lab Insights content is created in collaboration with a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact email@example.com.