Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More
There has been a lot of excitement (and hype) surrounding generative AI (artificial intelligence) in 2022. Social media platforms such as Twitter and Reddit are filled with images created by generative machine learning models such as DALL-E and Stable Diffusion. Startups building products on top of generative models are attracting funding despite the market downturn. And Big Tech companies are integrating generative models into their mainstream products.
Generative AI is not new. With a few notable exceptions, most of the technologies we’re seeing today have existed for several years. However, the convergence of several trends has made it possible to productize generative models and bring them to everyday applications. The field still has many challenges to overcome, but there is little doubt that the market for generative AI is bound to grow in 2023.
Scientific improvements in generative AI
Generative AI became popular in 2014 with the advent of generative adversarial networks (GANs), a type of deep learning architecture that could create realistic images — such as faces — from noise maps. Scientists later created other variants of GANs to perform other tasks such as transferring the style of one image to another. GANs and the variational autoencoders (VAE), another deep learning architecture, later ushered in the era of deepfakes, an AI technique that modifies images and videos to swap one person’s face for another.
2017 saw the advent of the transformer, a deep learning architecture underlying large language models (LLMs) such as GPT-3, LaMDA and Gopher. The transformer is used to generate text, software code and even protein structures. A variation of the transformer, the “vision transformer,” is also used for visual tasks such as image classification. An earlier version of OpenAI’s DALL-E used the transformer to generate images from text.
Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.
Transformers are scalable, which means their performance and accuracy improve as they are made larger and fed more data. But more importantly, transformer models can be trained through unsupervised or self-supervised learning, meaning they require no or very little human-annotated data, which has been one of the main bottlenecks of deep learning.
Contrastive Language-Image Pre-training (CLIP), a technique introduced by OpenAI in 2021, became pivotal in text-to-image generators. CLIP is very effective at learning shared embeddings between images and text by learning from image-caption pairs collected from the internet. CLIP and diffusion (another deep learning technique for generating images from noise) were used in OpenAI’s DALLE-2 to generate high-resolution images with stunning detail and quality.
As we moved toward 2022, better algorithms, larger models and bigger datasets helped improve the output of generative models, creating better images, writing high-quality software code and generating long stretches of (mostly) coherent text.
Discovering the right applications
Generative models were first presented as systems that could take on big chunks of creative work. GANs became famous for generating complete images with little input. LLMs like GPT-3 made the headlines for writing full articles.
But as the field has evolved, it has become evident that generative models are unreliable when left on their own. Many scientists agree that current deep learning models — no matter how large they are — lack some of the basic components of intelligence, which makes them prone to committing unpredictable mistakes.
Product teams are learning that generative models perform best when they are implemented in ways that give greater control to users.
The past year has seen several products that use generative models in smart, human-centric ways. For example, Copy AI, a tool that uses GPT-3 to generate blog posts, has an interactive interface in which the writer and the LLM write the outline of the article and flesh it out together.
Applications built with DALL-E 2 and Stable Diffusion also highlight user control with features that allow for editing, regenerating or configuring the output of the generative model.
As Douglas Eck, principal scientist at Google Research, said at a recent AI conference, “It’s no longer about a generative model that creates a realistic picture. It’s about making something that you created yourself. Technology should serve our need to have agency and creative control over what we do.”
Creating the right tools and infrastructure
In tandem with the algorithms and applications, the computational infrastructure and platforms for generative models have evolved. This has helped many companies integrate generative AI into their applications without the need for the specialized skills required to set up and run generative models.
Product teams with seasoned machine learning engineers can use open-source generative models such as BLOOM and Stable Diffusion. Meanwhile, teams that don’t have in-house machine learning talent can choose from a wide variety of solutions such as OpenAI API, Microsoft Azure, and HuggingFace Inference Endpoints. These platforms abstract away the complexities of setting up the models and running them at scale.
Also of note is the evolution of MLops platforms, which are making it possible to set up complete pipelines for gathering feedback data, versioning datasets and models, and fine-tuning models for specific applications.
What’s next for generative AI?
The generative AI industry still has challenges to overcome, including ethical and copyright complications.
But it is interesting to see the generative AI space develop. For the moment, the main winners are Big Tech companies with data, compute power and an established market and products to deliver the added value of generative models. For example, Microsoft is taking advantage of its cloud infrastructure, its exclusive access to OpenAI’s technology and the huge market for its office and creativity tools to bring the power of generative models to its users.
Adobe is also preparing to integrate generative AI in its video and graphic design tools. And Google also has several generative AI products in the works.
Down the road, however, the real power of generative AI might manifest itself in new markets. Who knows, maybe generative AI will usher in a new era of applications that we had never thought of before.
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.