Researchers propose bias fix for GPT-3 and other language models

Few-shot learning, or the ability to learn tasks from a few examples, is a key aspect of human intelligence. Large AI natural language models like OpenAI's GPT-3 can perform few-shot learning without fine-tuning. But despite the promise of few-shot learning, new research finds that the accuracy of language models -- particularly GPT-3 -- can be "highly unstable" absent calibration.

The research, which was coauthored by scientists at UC Berkeley, UC Irvine, and the University of Maryland, is the latest to find flaws in GPT-3 and other models like it. OpenAI itself notes that GPT-3 places words like " naughty" or "sucked" near female pronouns and "Islam" near words like "terrorism." A paper by Stanford University Ph.D. candidate and Gradio founder Abubakar Abid detailed the anti-Muslim tendencies of text generated by GPT-3. And the Middlebury Institute of International Studies' Center on Terrorism, Extremism, and Counterterrorism claims that GPT-3 could reliably generate " informational" and " influential" text that might "radicalize individuals into violent far-right extremist ideologies and behaviors."

Operating on the assumption that GPT-3 is susceptible to certain kinds of instability, the researchers benchmarked the model via the OpenAI API using training examples from datasets for text classification, fact retrieval, and information extraction. The examples were in a range of different formats and orderings, including question-answer templates, conversation-style templates, and prompts that resembled particular web pages.

In their experiments, the researchers found that different choices regarding format and ordering could lead to fluctuations in accuracy. For example, changing the order of the training examples while GPT-3 was classifying their sentiment prompted a shift in accuracy from near-chance (54%) to near-state-of-the-art (93%). Interestingly, adding more training examples into the training examples didn't necessarily reduce the variance in accuracy, with some training examples even hurting accuracy.

The researchers say they identified three pitfalls that lead language models like GPT-3 to be biased toward certain answers: majority label bias, recency bias, and common token bias. The majority label and recency biases lead the model to predict answers that appear frequently or near the end of a prompt. On the other hand, the common token bias leads the model to prefer answers frequent in its pretraining data, for example "United States" over "Saint Lucia."

The researchers attempted to counteract these biases by "calibrating" the output distribution, estimating the model's bias towards certain answers by feeding in dummy inputs that were content-free (e.g., "N/A"). They fitted the calibration parameters so that the content-free input had uniform scores for each answer, which they claim provided a good setting of the parameters without additional training data.

The results of experiments show that calibration consistently improved GPT-3's accuracy across prompt formats and examples while making the accuracy more stable. "Through a detailed analysis, we identify that this volatility arises from biases in language models, e.g., their tendency to output recent or common tokens," the coauthors wrote in a paper describing their work. "We use these insights to develop contextual calibration -- a simple procedure to adjust the model's output probabilities -- which improves accuracy, reduces variance, and overall makes tools like GPT-3 more effective for end users."

More