Does your enterprise plan to try out GPT-3? Here’s what you should know

In a previous article, I talked about the market advantages enterprises could reap by developing applications using OpenAI’s GPT-3 natural language model. Here I want to provide a bit of a primer for companies taking a first look at the technology.

There’s currently a waiting list to gain access to the GPT-3 API, but I’ve had an opportunity to play around in the system. For those who haven't tried it out yet, here are a few things to be prepared for:

1. Context is everything

The input you give GPT-3 is some seed text that you want to train the model on. This is the context you're setting for GPT-3's response. But you also provide a "prefix" to its response. This prefix is a direction that controls the text generated by the model, and it's marked with a colon at the end. For example, you can give a paragraph as context and use a prefix like “Explain to a 5-year-old:” to generate a simple explanation. (It is highly recommended not to add any space after the prefix). Below is a sample response from GPT-3.

As you can see in the above example, your prefix doesn’t need to follow any complex machine-readable encoding. It is just a simple human-readable phrase.

You can use multiple prefixes to describe a larger or extended context -- as in a chatbot example. You want to provide a history of chat to help the bot generate responses. This context is used to tune the output of GPT-3 and generate response. For instance, you could make the chatbot helpful and friendly, or you could make it assertive and unfriendly. In the example below, I've given GPT-3 four prefixes. I've provided sample output for the first three and then left GPT-3 to continue from there.

Since the output you get from the model depends entirely on the context you provide, it's important to construct these elements carefully.

2. Configure carefully or risk your tokens

Configurations are the settings shown at right in the examples above. These are parameters that you include with your API call that help tune the response. For example, you can change the randomness of responses using the Temperature configuration setting, which has a range from 0 to 1. If Temperature is set to 0, every time you make a call with some context you will get the same response. If the Temperature is 1 then the response will be highly randomized.

Another configurable you can tune is Response Length, which limits the text sent back by the API. Keep in mind that OpenAI charges for use of the platform on a token basis rather than a per-word basis. And a token will usually cover you for four characters. So, in the testing phase, make sure to tune your response length so you don’t use all of your tokens right away.

With the 3 month free trail of GPT-3 you get $18 worth of tokens. I ended up consuming almost 75% of mine just with some experimentation with the API. There are actually four different versions of the GPT-3 model available as "engines," and each of them has a different pricing model. The usual cost for tokens as of today is $0.06 per thousand tokens for the DaVinci engine, which is best-performing of the four. The less user-friendly engines, Curie, Babbage, and Ada, are $.006, $0.0012, and $0.0008 per thousand tokens respectively.

3. MLaaS will be bigger than SaaS

GPT-3 is probably the most famous example of an advanced natural-language-processing API, but it’s likely to become one of many as the NLP ecosystem matures. Machine learning as a service (MLaaS) is a powerful business model because you can either spend the time and money to pre-train a model yourself (for context, GPT-3 cost OpenAI nearly $12 million to train), or you can purchase a pre-trained model for pennies on the dollar.

In GPT-3’s case, every call you make to the API is routed to some shared instance of the GPT-3 model running in OpenAI’s cloud. As mentioned earlier, the DaVinci engine performs best, but you should experiment for yourself with each engine for specific use cases.

DaVinci is forgiving if your input context has spelling mistakes or extra/missing spaces, and it gives a very readable response. You can sense it has been trained on a larger corpus and is resilient to errors. The cheaper engines will need you to do more work to frame the context and usually will need tuning to get exactly kind of response expected. Below is an example of classification of companies with misspelled name FedExt in the context. DaVinci is able to get right response while Ada gets it wrong.

4. Models will be built on top of each other like Russian dolls

GPT-3 is a stateless language model, which means it doesn’t remember your previous requests or learn from them. It relies solely on its original training (which pretty much constitutes all the text on the internet) and the context and configuration you provide it.

This is the major hurdle for enterprises in adoption. You can generate some very interesting demos, but for GPT-3 to be a serious contender for real-world use cases in banking, healthcare, industrial, etc. we will need to train models that are domain specific. For example, you would want a model trained on your company’s internal policy documents or patient health records or machinery manuals.

So, applications built directly on top of GPT-3 may not have actual use to enterprises. A more lucrative monetization scheme could be to host GPT-3-like models as an API specialized for specific problems like drug discovery, insurance policy recommendation, financial reports summarization, planning machinery maintenance, etc.

The end use would be to leverage an application built on a model built on top of another model. A specialized model built by an enterprise on its proprietary data will also need to be able to adapt based on new knowledge obtained from business documents in order to stay relevant. In the future, we will see more domain language models with an active learning capability. And we will most likely see an active learning business model from GPT-3 eventually, too, where organizations will be able to train an instance incrementally on their custom data. However, this will come at a significant price point since it will require OpenAI to host a unique instance for that customer.

Dattaraj Rao is Innovation and R&D Architect at Persistent Systems and author of the book Keras to Kubernetes: The Journey of a Machine Learning Model to Production. At Persistent Systems, he leads the AI Research Lab. He has 11 patents in machine learning and computer vision.

1. Context is everything

2. Configure carefully or risk your tokens

3. MLaaS will be bigger than SaaS

4. Models will be built on top of each other like Russian dolls

More