OpenAI and Stanford researchers call for urgent action to address harms of large language models like GPT-3

The makers of large language models like Google and OpenAI may not have long to set standards that sufficiently address their impact on society. Open source projects currently aiming to recreate GPT-3 include GPT-Neo, a project headed by EleutherAI. That's according to a paper published last week by researchers from OpenAI and Stanford University.

"Participants suggested that developers may only have a six- to nine-month advantage until others can reproduce their results. It was widely agreed upon that those on the cutting edge should use their position on the frontier to responsibly set norms in the emerging field," the paper reads. "This further suggests the urgency of using the current time window, during which few actors possess very large language models, to develop appropriate norms and principles for others to follow."

The paper looks back at a meeting held in October 2020 to consider GPT-3 and two pressing questions: "What are the technical capabilities and limitations of large language models?" and "What are the societal effects of widespread use of large language models?" Coauthors of the paper described "a sense of urgency to make progress sooner than later in answering these questions."

When the discussion between experts from fields like computer science, philosophy, and political science took place last fall, GPT-3 was the largest known language model, at 175 billion parameters. Since then, Google has released a trillion-parameter language model.

Large language models are trained using vast amounts of text scraped from sites like Reddit or Wikipedia as training data. As a result, they've been found to contain bias toward a number of groups, including people with disabilities and women. GPT-3, which is being exclusively licensed to Microsoft, seems to have a particularly low opinion of Black people and appears to be convinced all Muslims are terrorists.

Large language models could also perpetuate the spread of disinformation and could potentially replace jobs.

Perhaps the most high-profile criticism of large language models came from a paper coauthored by former Google Ethical AI team leader Timnit Gebru. That paper, which was under review at the time Gebru was fired in late 2020, calls a trend of language models created using poorly curated text datasets "inherently risky" and says the consequences of deploying those models fall disproportionately on marginalized communities. It also questions whether large language models are actually making progress toward humanlike understanding.

"Some participants offered resistance to the focus on understanding, arguing that humans are able to accomplish many tasks with mediocre or even poor understanding," the OpenAI and Stanford paper reads.

Experts cited in the paper return repeatedly to the topic of which choices should be left in the hands of businesses. For example, one person suggests that letting businesses decide which jobs should be replaced by a language model would likely have "adverse consequences."

"Some suggested that companies like OpenAI do not have the appropriate standing and should not aim to make such decisions on behalf of society," the paper reads. "Someone else observed that it is especially difficult to think about mitigating bias for multi-purpose systems like GPT-3 via changes to their training data, since bias is typically analyzed in the context of particular use cases."

Participants in the study suggest ways to address the negative consequences of large language models, such as enacting laws that require companies to acknowledge when text is generated by AI -- perhaps along the lines of California's bot law. Other recommendations include:

Training a separate model that acts as a filter for content generated by a language model
Deploying a suite of bias tests to run models through before allowing people to use the model
Avoiding some specific use cases

Prime examples of such use cases can be found in large computer vision datasets like ImageNet, an influential dataset of millions of images developed by Stanford researchers with Mechanical Turk employees in 2009. ImageNet is widely credited with moving the computer vision field forward. But following accounts of ImageNet's major shortcomings -- like Excavating AI -- in 2019 ImageNet's creators removed the people category and roughly 600,000 images from the dataset. Last year, similar issues with racist, sexist, and offensive content led researchers at MIT to end the 80 Million Tiny Images dataset created in 2006. At that time, Prabhu told VentureBeat he would have liked to have seen the dataset reformed rather than canceled.

Some in the field have recommended audits of algorithms by independent external actors as a way to address harm associated with deploying AI models. But that would likely require industry standards not yet in place.

A paper published last month by Stanford University Ph.D. candidate and Gradio founder Abubakar Abid detailed the anti-Muslim tendencies of text generated by GPT-3. Abid's video of GPT-3 demonstrating anti-Muslim bias has been viewed nearly 300,000 times since August 2020.

In experiments detailed in a paper on this subject, he found that even the prompt "Two Muslims walked into a mosque to worship peacefully" generates text about violence. The paper also says that preceding a text generation prompt can reduce violence mentions for text mentioning Muslims by 20-40%.

"Interestingly, we found that the best-performing adjectives were not those diametrically opposite to violence (e.g. 'calm' did not significantly affect the proportion of violent completions). Instead, adjectives such as 'hard-working' or 'luxurious' were more effective, as they redirected the focus of the completions toward a specific direction," the paper reads.

In December 2020, more than 30 OpenAI researchers received the Best Paper award for their paper about GPT-3 at NeurIPS, the largest annual machine learning research conference. In a presentation about experiments probing anti-Muslim bias in GPT-3 presented at the first Muslims in AI workshop at NeurIPS, Abid described anti-Muslim bias demonstrated by GPT-3 as persistent and noted that models trained with massive text datasets are likely to have extremist and biased content fed into them. In order to deal with bias found in large language models, you can do a post-factor filtering approach like OpenAI does today, but he said in his experience that leads to innocuous things that have nothing to do with Muslims getting flagged as bias, which is another problem.

"The other approach would be to somehow modify or fine-tune the bias from these models, and I think that is probably a better direction because then you could release a fine-untuned model into the world and that kind of thing," he said. "Through these experiments, I think in a manual way we have seen that it is possible to mitigate the bias, but can we automate this process and optimize this process? I think that's a very important open-ended research question."

In somewhat related news, in an interview with VentureBeat last week following a $1 billion funding round, Databricks CEO Ali Ghodsi said the money was raised in part to acquire startups developing language models. Ghodsi listed GPT-3 and other breakthroughs in machine learning among trends that he expects to shape the company's expansion. Microsoft invested in Databricks in a previous funding round. And in 2018, Microsoft acquired Semantic Machines, a startup with ties to Stanford University and UC Berkeley.

Correction: The initial version of this story stated that researcher Abubakar Abid received a Best Paper award at NeurIPS in 2020. OpenAI researchers received the award for their work detailing the performance of GPT-3. We regret our error.

More