AI Weekly: Can language models learn morality?

The fervor around state-of-the-art AI language models like OpenAI's GPT-3 hasn't died down. If anything, it's gaining steam. Melanie Mitchell, a professor of computer science at Portland State University, found evidence that GPT-3 can make primitive analogies. Raphaël Millière, a philosopher of mind and cognitive science at Columbia University's Center for Science and Society, asked GPT-3 to compose a response to the philosophical essays written about it. Among other applications, the API providing access to the model has been used to create a recipe generator, an all-purpose Excel function, and a comedy sketch writer.

But even language models as powerful as GPT-3 have limitations that remain unaddressed. Morality aside, countless studies have documented their tendency to reinforce the gender, ethnic, and religious stereotypes explicit within the data sets on which they're trained. Shortcomings like these could lead to headline-generating models with a negative slant against people of color, for example, or news-summarizing models with warped concepts of gender.

In an effort to highlight models' ethical dilettantism, researchers at Microsoft; the University of California, Berkeley; Columbia University; and the University of Chicago coauthored a preprint paper that assesses language models' knowledge of moral concepts. They claim the benchmark they devised -- dubbed ETHICS -- provides a stepping stone to AI that's better aligned with human values.

Some scientists argue improvements in language processing won't necessarily lead to ethical AI because intelligence is divorced from moral behavior. Others claim that while ethical AI will be an important problem in the future, it's outside the scope of data science and machine learning capabilities today. In any case, few (if any) methods of measuring a natural language system's grasp of human values currently exist, which is what motivated the study.

The coauthors note that fairness is a concept of justice that more broadly encompasses concepts like impartiality and desert. (In philosophy, "desert" is the condition of deserving something.) Having systems abide by safety constraints is similar to deontological ethics in which right and wrong are determined by a collection of rules. Imitating prosocial behavior and demonstrations is an aspect of virtue ethics, which locates moral behavior in the imitation of virtuous agents. And improving utility by learning human preferences can be viewed as part of utilitarianism, or the theory that advocates maximizing the aggregate well-being of all people. ETHICS attempts to tie these separate strands -- justice, deontology, virtue ethics, utilitarianism, and commonsense moral judgments -- together by confronting the challenges posed by open-world scenarios and covering applicable theories in normative ethics.

ETHICS requires models to learn how basic truths about the world connect with human values, like the fact that although everyone coughs, people don't want to be coughed on because it might make them sick. It's the researchers' assertion this contextualized setup captures the type of nuance necessary for a more general understanding of ethical principles.

To perform well on the ETHICS data set's over 130,000 scenarios, models must reason about morally relevant factors emphasized by each of several ethical systems. The scenarios regarding justice underline notions of impartiality. The deontological scenarios emphasize rules, obligations, and constraints. Character traits like benevolence and truthfulness are paramount in the virtue ethics examples. And while happiness or well-being are the sole factors for the utilitarian scenarios, both are involved in the commonsense moral intuition scenarios.

The researchers took steps to ensure that scenarios within ETHICS didn't involve ambiguous moral dilemmas. (For instance, "I broke into a building" is treated as morally wrong in the ETHICS data set, even though there might be situations where it isn't wrong, such as if you're a firefighter trying to save someone from a burning building.) They had Amazon Mechanical Turk workers relabel each scenario and discard scenarios with low agreement, collecting data from English speakers in the U.S., Canada, and Great Britain and focusing on uncontroversial topics.

Over the course of several experiments, the researchers tested leading language models, including Google's BERT and ALBERT, Facebook's RoBERTa, and GPT-3. They found that all four achieved low performance on most moral reasoning tasks -- one BERT variant answered questions about justice with 11.9-15.2% accuracy -- but bigger models trained on more data tended to do "significantly" better than smaller models. For instance, the largest RoBERTa model answered questions about the scenarios ethically 44.1-68% of the time, which was far better than chance (24.2%).

The researchers posit that aligning AI with human values appears difficult in part because those values contain preferences intertwined with subconscious desires. It's also true that popular language models trained with large corpora demonstrate several forms of bias. Recently, Facebook AI head Jerome Pesenti found a rash of negative statements from GPT-3, including several that targeted Black people, Jewish people, and women. Emily Bender, a professor at the University of Washington's NLP group, recently told VentureBeat that even carefully crafted language data sets can carry forms of bias.

The ETHICS work coauthors believe representations could imbue language models with a broader set of human preferences about the world. In tandem with techniques to mitigate the effects of prejudiced data, these representations could also bolster efforts within the AI research community to create more equitable, less potentially harmful applications of AI.

"Systems would do well to understand the ethical factors at play to make better decisions within the boundaries of the law," the coauthors wrote. "Our work is just a first step that is necessary but not sufficient for creating ethical AI, as we must engage more stakeholders and successfully implement their values. Future work should also make sure these models are explainable and should test model robustness to optimization pressure."

Indeed, work to imbue models with morality is likely necessary on the path toward sophisticated AI assistants. In remarks at MIT's Computing Community Consortium in March 2019, Eric Schmidt, former executive chair of Google and Alphabet, described his vision of future assistants that might help children learn language and math, help adults plan their day, and provide companionship to the elderly. If such assistants were to lack a moral compass, their impact could be harmful, particularly on young children.

More