OpenAI today said it plans to release a version of GPT-2, an advanced conversational AI model that stirred controversy after it release in February. The version released today is 774 million parameters, and follows the release of smaller versions for select researchers in February and May.
OpenAI also shared an open source legal agreement today to help companies that create large AI models to establish their own model-sharing agreements. The full model with roughly 1.5 billion parameters has not yet been released, though OpenAI said it has spoken with 5 organizations that have replicated the model since February.
Details about OpenAI’s approach to GPT-2 and its staged release approach were delivered today in a white paper by OpenAI researchers and Harvard University research associate Ariel Herbert-Voss.
The paper also describes why OpenAI chose to delay a full release.
“As performance across dimensions — such as the reliability of generating coherent text — tends to improve with model size, we decided not to release all four GPT-2 models simultaneously due to concerns about the larger models being misused. By staggering releases, we allow time for risk analyses and use findings from smaller models to inform the actions taken with larger ones,” the paper reads.
Citing concern for misuse and potential automation of deepfakes by malicious actors, OpenAI chose not to share all four versions of the model when GPT-2 made its debut in February, achieving state-of-the-art results on a range of tasks. GPT-2 was trained using 40 gigabytes of internet text.
Initial critics of OpenAI’s approach said that failure to release source code posed a potential threat to society and scientists who lack the resources to replicate the model or its results.
To establish a lower risk of misinformation spread than was thought possible at the time of the release of GPT-2, OpenAI worked with security experts, monitored GPT-2 use by people, and “conducted in-house research into automated detection, biases, and misuse potential.”
To continue to explore the potential for misuse and how to safely release large models like GPT-2, OpenAI established partnerships with the University of Oregon; University of Texas, Austin; The Middlebury Institute of International Studies; and Cornell University.
Conversely, potential positive use cases of GPT-2 cited in the paper include helping writers do their jobs, code automation for software engineers, better chatbots, and answering questions about health.
Analysis by Cornell University researchers published in Foreign Affairs earlier this month found that more than 70% of people who read text generated by GPT-2 found it credible as a New York Times article.
The paper cites a staggered release approach by Allen Institute for Artificial Intelligence and Hugging Face NLP as a possible way to approach the release of large language understanding models in the future.
“We think that a combination of staged release and partnership-based model sharing is likely to be a key foundation of responsible publication in AI, particularly in the context of powerful generative models,” researchers said in a blog post that shares links to the legal agreement and paper.
“The issues inherent to large models are going to grow, rather than diminish, over time. We hope that our work with GPT-2 will help provide evidence the AI community can draw on when thinking about the publication challenges inherent to some parts of AI research.”