OpenAI is reducing the price of the GPT-3 API — here's why it matters

OpenAI is slashing the price of its GPT-3 API service by up to two-thirds, according to an announcement on the company’s website. The new pricing plan, which is effective September 1, may have a large impact on companies that are building products on top of OpenAI’s flagship large language model (LLM).

The announcement comes as recent months have seen growing interest in LLMs and their applications in different fields. And service providers will have to adapt their business models to the shifts in the LLM market, which is rapidly growing and maturing.

The new pricing of the OpenAI API highlights some of these shifts that are taking place.

A bigger market with more players

The transformer architecture, introduced in 2017, paved the way for current large language models. Transformers are suitable for processing sequential data like text, and they are much more efficient than their predecessors (RNN and LSTM) at scale. Researchers have consistently shown that transformers become more powerful and accurate as they are made larger and trained on larger datasets.

In 2020, researchers at OpenAI introduced GPT-3, which proved to be a watershed moment for LLMs. GPT-3 showed that LLMs are “few-shot learners,” which basically means that they can perform new tasks without undergoing extra training cycles and by being shown a few examples on the fly. But instead of making GPT-3 available as an open-source model, OpenAI decided to release a commercial API as part of its effort to find ways to fund its research.

GPT-3 increased interest in LLM applications. A host of companies and startups started creating new applications with GPT-3 or integrating the LLM in their existing products.

The success of GPT-3 encouraged other companies to launch their own LLM research projects. Google, Meta, Nvidia and other large tech companies accelerated work on LLMs. Today, there are several LLMs that match or outpace GPT-3 in size or benchmark performance, including Meta’s OPT-175B, DeepMind’s Chinchilla, Google’s PaLM and Nvidia’s Megatron MT-NLG.

GPT-3 also triggered the launch of several open-source projects that aimed to bring LLMs available to a wider audience. BigScience’s BLOOM and EleutherAI’s GPT-J are two examples of open-source LLMs that are available free of charge.

And OpenAI is no longer the only company that is providing LLM API services. Hugging Face, Cohere and Humanloop are some of the other players in the field. Hugging Face provides a large variety of different transformers, all of which are available as downloadable open-source models or through API calls. Hugging Face recently released a new LLM service powered by Microsoft Azure, which OpenAI also uses for its GPT-3 API.

The growing interest in LLMs and the diversity of solutions are two elements that are putting pressure on API service providers to reduce their profit margins to protect and expand their total addressable market.

Hardware advances

One of the reasons that OpenAI and other companies decided to provide API access to LLMs is the technical challenges of training and running the models, which many organizations can’t handle. While smaller machine learning models can run on a single GPU, LLMs require dozens or even hundreds of GPUs.

Aside from huge hardware costs, managing LLMs requires experience in complicated distributed and parallel computing. Engineers must split the model into multiple parts and distribute it across several GPUs, which will then run the computations in parallel and in sequences. This is a process that is prone to failure and requires ad-hoc solutions for different types of models.

But with LLMs becoming commercially attractive, there is growing incentive to create specialized hardware for large neural networks.

OpenAI’s pricing page states the company has made progress in making the models run more efficiently. Previously, OpenAI and Microsoft had collaborated to create a supercomputer for large neural networks. The new announcement from OpenAI suggests that the research lab and Microsoft have managed to make further progress in developing better AI hardware and reducing the costs of running LLMs at scale.

Again, OpenAI faces competition here. An example is Cerebras, which has created a huge AI processor that can train and run LLMs with billions of parameters at a fraction of the costs and without the technical difficulties of GPU clusters.

Other big tech companies are also improving their AI hardware. Google introduced the fourth generation of its TPU chips last year and its TPU v4 pods this year. Amazon has also released special AI chips, and Facebook is developing its own AI hardware. It wouldn’t be surprising to see the other tech giants use their hardware powers to try to secure a share of the LLM market.

Fine-tuned LLMs remain off limits — for now

The interesting detail in OpenAI’s new pricing model is that it will not apply to fine-tuned GPT-3 models. Fine-tuning is the process of retraining a pretrained model on a set of application-specific data. Fine-tuned models improve the performance and stability of neural networks on the target application. Fine-tuning also reduces inference costs by allowing developers to use shorter prompts or smaller fine-tuned models to match the performance of a larger base model on their specific application.

For example, if a bank was previously using Davinci (the largest GPT-3 model) for its customer service chatbot, it can fine-tune the smaller Curie or Babbage models on company-specific data. This way, it can achieve the same level of performance at a fraction of the cost.

At current rates, fine-tuned models cost double their base model counterparts. After the price change, the price difference will rise to 4-6x. Some have speculated that fine-tuned models are where OpenAI is really making money with the enterprise, which is why the prices won’t change.

Another reason might be that OpenAI still doesn’t have the infrastructure to reduce the costs of fine-tuned models (as opposed to base GPT-3, where all customers use the same model, fine-tuned models require one GPT-3 instance per customer). If so, we can expect the prices of fine-tuning to drop in the future.

It will be interesting to see what other directions the LLM market will take in the future.

A bigger market with more players

Hardware advances

Fine-tuned LLMs remain off limits — for now

More