OpenAI today announced the launch of an API for accessing new natural language processing models its researchers developed, including the recently released GPT-3. The company claims that, unlike most AI systems designed for one use case, the API provides a general-purpose “text in, text out” interface, allowing users to try it out on virtually any English language task.
The API is available in beta for free for the first two months, and only qualified customers will be provided access, according to OpenAI — there’s a sign-up process. (Companies like Algolia, Koko, MessageBird, Sapling, Replika, Casetext, Quizlet, and Reddit, along with researchers at institutions like the Middlebury Institute, piloted it prior to launch.) The company says the API will both provide a source of revenue to cover its costs and enable it to work closely with partners to see what challenges arise when AI systems are used in the real world.
“The field’s pace of progress means that there are frequently surprising new applications of AI, both positive and negative. We will terminate API access for obviously harmful use-cases, such as harassment, spam, radicalization, or astroturfing,” OpenAI wrote in a blog post. “[This] model allows us to more easily respond to misuse of the technology. Since it is hard to predict the downstream use cases of our models, it feels inherently safer to release them via an API and broaden access over time, rather than release an open source model where access cannot be adjusted if it turns out to have harmful applications.”
Given any text prompt, OpenAI’s API returns a text completion, attempting to roughly match the pattern provided to it. A developer can “program” it by showing it just a few examples of what they’d like it to do; its success varies depending on how complex the task is. The API can also hone its performance on specific tasks by training on a data set of examples provided, or by learning from human feedback given either by users or labelers.
For example, the API can identify relevant content for natural language queries without using keywords. It also enables complex discussions — with a brief prompt, it generates dialogues spanning a range of topics, from space travel to history — and the transformation of text into simplified summaries. It’s even able to complete code based on function names and comments; to generate spreadsheet tables with suggested categories; and to translate natural language to Unix commands using a handful of representative samples.
“We’ve designed the API to be both simple for anyone to use but also flexible enough to make machine learning teams more productive. In fact, many of our teams are now using the API so that they can focus on machine learning research rather than distributed systems problems,” OpenAI continued. “We’re hopeful that the API will make powerful AI systems more accessible to smaller businesses and organizations.”
OpenAI publishes studies in AI subfields from computer vision to natural language processing (NLP), with the stated mission of safely creating superintelligent software. The startup began in 2015 as a nonprofit but later restructured as a capped-profit company under OpenAI LP, an investment vehicle.
Perhaps anticipating backlash from the AI community, OpenAI says the API will monetarily support its ongoing research, safety, and policy efforts. Certainly, OpenAI’s advancements haven’t come cheap — GPT-3 alone is estimated to have a memory requirement exceeding 350GB and training costs exceeding $12 million. To fund them, OpenAI previously secured a $1 billion endowment from its founding members and investors and a $1 billion investment from Microsoft, a portion of which funded the development of an AI supercomputer running on Azure. So far, OpenAI LP has attracted funds from Reid Hoffman’s charitable foundation and Khosla Ventures.
The API will also inform the development of the large models underlying it, according to OpenAI, as the company continues to conduct research into the potential misuses including with third-party researchers via its academic access program. The goal over time is to develop a “thorough understanding” of the API’s potential harms and continually improve tools and processes to help minimize them, OpenAI says.
“Mitigating negative effects such as harmful bias is a hard, industry-wide issue that is extremely important. Ultimately, our API models do exhibit biases that will appear on occasion in generated text,” wrote OpenAI. “[That’s why] we’re developing usage guidelines with users to help them learn from each other and mitigate these problems in practice. [We’re also] working closely with users to deeply understand their use cases and develop tools to label and intervene on manifestations of harmful bias, [and we’re] conducting our own research into harmful bias and broader issues in fairness and representation, which will help inform our work with our users.”
In the past, OpenAI has adopted a cautious — and controversial — approach to mitigation. Citing concern for misuse and potential automation of deepfakes by malicious actors, it chose not to share all four versions of the model when GPT-2 made its debut last February, which achieved leading results on a range of tasks. Critics of the decision said that failure to release source code posed a potential threat to society and scientists who lack the resources to replicate the model or its results. Others called it a publicity stunt.
OpenAI subsequently released several smaller and less complex versions of GPT-2 and studied their reception as well as the data sets on which they trained on. After concluding that there was “no strong evidence” of misuse, it published the full model — which was trained on 8 million text documents scraped from the web — in December.