We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
The most impressive thing about OpenAI’s natural language processing (NLP) model, GPT-3, is its sheer size. With more than 175 billion weighted connections between words known as parameters, the transformer encoder-decoder model blows its 1.5 billion parameter predecessor, GPT-2, out of the water. This has allowed the model to generate text that is surprisingly human-like after only being fed a few examples of the task you want it to do.
Its release in 2020 dominated headlines, and people were scrambling to get on the waitlist to access its API hosted on OpenAI’s cloud service. Now, months later, as more users have gained access to the API (myself included), interesting applications and use cases have been popping up every day. For instance, Debuild.co has some really interesting demos where you can build an application by giving the program a few simple instructions in plain English.
Despite the hype, questions persist as to whether GPT-3 will be the bedrock upon which an NLP application ecosystem will rest or if newer, stronger NLP models with knock it off its throne. As enterprises begin to imagine and engineer NLP applications, here’s what they should know about GPT-3 and its potential ecosystem.
GPT-3 and the NLP arms race
As I’ve described in the past, there are really two approaches for pre-training an NLP model: generalized and ungeneralized.
An ungeneralized approach has specific pretraining objectives that are aligned with a known use case. Basically, these models go deep in a smaller, more focused data set rather than going wide in a massive data set. An example of this is Google’s PEGASUS model, which is built specifically to enable text summarization. PEGASUS is pretrained on a data set that closely resembles its final objective. It is then fine-tuned on text summarization datasets to deliver state-of-the-art results. The benefit of the ungeneralized approach is that it can dramatically increase accuracy for specific tasks. However, it is also significantly less flexible than a generalized model and still requires a lot of training examples before it can begin achieving accuracy.
A generalized approach, in contrast, goes wide. This is GPT-3’s 175 billion parameters at work, and it’s essentially pretrained on the entire internet. This allows GPT-3 to execute basically any NLP task with just a handful of examples, though its accuracy is not always ideal. In fact, the OpenAI team highlights the limits of generalized pre-training and even cede that GPT-3 has “notable weaknesses in text synthesis.”
OpenAI has decided that going bigger is better when it comes to accuracy problems, with each version of the model increasing the number of parameters by orders of magnitude. Competitors have taken notice. Google researchers recently released a paper highlighting a Switch Transformer NLP model that has 1.6 trillion parameters. This is a simply ludicrous number, but it could mean we’ll see a bit of an arms race when it comes to generalized models. While these are far and away the two largest generalized models, Microsoft does have Turing-NLG at 17 billion parameters and might be looking to join the arms race as well. When you consider that it cost OpenAI almost $12 million to train GPT-3, such an arms race could get expensive.
Promising GPT-3 applications
GPT-3’s flexibility is what makes it attractive from an application ecosystem standpoint. You can use it to do just about anything you can imagine with language. Predictably, startups have begun to explore how to use GPT-3 to power the next generation of NLP applications. Here’s a list of interesting GPT-3 products compiled by Alex Schmitt at Cherry Ventures.
Many of these applications are broadly consumer-facing such as the “Love Letter Generator,” but there are also more technical applications such as the “HTML Generator.” As enterprises consider how and where they can incorporate GPT-3 into their business processes, a couple of the most promising early use cases are in healthcare, finance, and video meetings.
For enterprises in healthcare, financial services, and insurance, streamlining research is a huge need. Data in these fields is growing exponentially, and it’s becoming impossible to stay on top of your field in the face of this spike. NLP applications built on GPT-3 could scrape through the latest reports, papers, results, etc., and contextually summarize the key findings to save researchers time.
And as video meetings and telehealth became increasingly important during the pandemic, we’ve seen demand rise for NLP tools that can be applied to video meetings. What GPT-3 offers is the ability not just to script and take notes from an individual meeting, but also to generate “too long; didn’t read” (TL;DR) summaries.
How enterprises and startups can build a moat
Despite these promising use cases, the major inhibitor to a GPT-3 application ecosystem is how easily a copycat could replicate the performance of any application developed using GPT-3’s API.
Everyone using GPT-3’s API is getting the same NLP model pre-trained on the same data, so the only differentiator is the fine-tuning data that an organization leverages to specialize the use case. The more fine-tuning data you use, the more differentiated and more sophisticated the output.
What does this mean? Larger organizations with a higher number of users or more data than their competitors will better be able to take advantage of GPT-3’s promise. GPT-3 won’t lead to disruptive startups; it will allow enterprises and large organizations to optimize their offerings due to their incumbent advantage.
What does this mean for enterprises and startups moving forward?
Applications built using GPT-3’s API are just starting to scratch the surface of possible use cases, and so we haven’t yet seen an ecosystem of interesting proof-of-concepts develop. How such an ecosystem would monetize and mature is also still an open question.
Because differentiation in this context requires fine-tuning, I expect enterprises to embrace the generalization of GPT-3 for certain NLP tasks while sticking with ungeneralized models such as PEGASUS for more specific NLP tasks.
Additionally, as the number of parameters expands exponentially among the big NLP players, we could see users shifting between ecosystems depending on whoever has the lead at the moment.
Regardless of whether a GPT-3 application ecosystem matures or whether it’s superseded by another NLP model, enterprises should be excited at the relative ease with which it’s becoming possible to create highly articulated NLP models. They should explore use cases and consider how they can take advantage of their position in the market to quickly build out value-adds for their customers and their own business processes.
Dattaraj Rao is Innovation and R&D Architect at Persistent Systems and author of the book Keras to Kubernetes: The Journey of a Machine Learning Model to Production. At Persistent Systems, he leads the AI Research Lab. He has 11 patents in machine learning and computer vision.
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.