New startup shows how emotion-detecting AI is intrinsically problematic

In 2019, a team of researchers published a meta-review of studies claiming a person's emotion can be inferred from their facial movements. They concluded that there’s no evidence emotional state can be predicted from expression – regardless of whether a human or technology is making the determination.

"[Facial expressions] in question are not 'fingerprints' or diagnostic displays that reliably and specifically signal particular emotional states regardless of context, person, and culture," the coauthors wrote. "It is not possible to confidently infer happiness from a smile, anger from a scowl, or sadness from a frown."

Alan Cowen might disagree with this assertion. An ex-Google scientist, he's the founder of Hume AI, a new research lab and “empathetic AI” company emerging from stealth today. Hume claims to have developed datasets and models that “respond beneficially to cues of [human] emotions," enabling customers ranging from large tech companies to startups to measure emotions from a person's facial, vocal, and verbal expressions.

"When I got into the field of emotion science, most people were studying a handful of posed emotional expressions in the lab. I wanted to use data science to understand how people really express emotion out in the world, across demographics and cultures," Cowen told VentureBeat via email. "With new computational methods, I discovered a new world of subtle and complex emotional behaviors that nobody had documented before, and pretty soon I was publishing in the top journals. That’s when companies began reaching out."

Hume -- which has ten employees and recently raised $5 million in funding -- says that it uses "large, experimentally-controlled, culturally diverse" datasets from people spanning North American, Africa, Asia, and South America to train its emotion-recognizing models. But some experts dispute the idea that there's a scientific foundation for emotion-detecting algorithms, regardless of the data's representativeness.

"The nicest interpretation I have is that these are some very well-intentioned people who, nevertheless, are ignorant enough that ... it's tech causing the problem they're trying to fix," Os Keyes, an AI ethics scientist at the University of Washington, told VentureBeat via email. "Their starting product raises serious ethical questions ... [It's clear that they aren't] thoughtfully treating the problem as a problem to be solved, engaging with it deeply, and considering the possibility [that they aren't] the first person to think of it."

Measuring emotion with AI

Hume is one of several companies in the burgeoning "emotional AI" market, which includes HireVue, Entropik Technology, Emteq, Neurodata Labs, Neilson-owned Innerscope, Realeyes, and Eyeris. Entropik claims its technology, which it pitches to brands looking to measure the impact of marketing efforts, can understand emotions "by facial expressions, eye gaze, voice tonality, and brainwaves." Neurodata developed a product that's being used by Russian bank Rosbank to gauge the emotion of customers calling in to customer service centers.

It's not just startups that are investing in emotion AI. In 2016, Apple acquired Emotient, a San Diego firm working on AI algorithms that analyze facial expressions. Amazon's Alexa apologizes and asks for clarification when it detects frustration in a user's voice. Speech recognition company Nuance, which Microsoft purchased in April 2021, has demoed a product for cars that analyzes driver emotions from their facial cues. And Affectiva, an MIT Media Lab spin-out that once claimed it could detect anger or frustration in speech in 1.2 seconds, was snatched up by Swedish company Smart Eye in May.

The emotion AI industry is projected to almost double in size from $19 billion in 2020 to $37.1 billion by 2026, according to Markets and Markets. Venture capitalists, eager to get in on the ground floor, have invested a combined tens of millions of dollars in companies like Affectiva, Realeyes, and Hume. As the Financial Times reports, film studios such as Disney and 20th Century Fox are using it to measure reactions to upcoming shows and movies. Meanwhile, marketing firms have tested the technology to see how audiences respond to advertisements for clients like Coca-Cola and Intel.

The problem is that there exist few --if any -- universal markers of emotion, putting the accuracy of emotion AI into question. The majority of emotion AI startups base their work on psychologist Paul Ekman's seven fundamental emotions (happiness, sadness, surprise, fear, anger, disgust, and contempt), which he proposed in the early '70s. But subsequent research has confirmed the common-sense notion that there are major differences in the way that people from different backgrounds express how they're feeling.

Factors like context, conditioning, relationality, and cultural influence the way people respond to experiences. For example, scowling -- often associated with anger -- has been found to occur less than 30% of the time on the faces of angry people. The expression supposedly universal for fear is the stereotype for a threat or anger in Malaysia. Ekman himself later showed that there are differences between how American and Japanese students react to violent films, with Japanese students adopting "a completely different set of expressions" if someone else is in the room -- particularly an authority figure.

Gender and racial biases are a well-documented phenomenon in facial analysis algorithms, attributable to imbalances in the datasets used to train the algorithm. Generally speaking, an AI system trained on images of lighter-skinned people will perform poorly on people whose skin tones are unfamiliar to it. This isn't the only type of bias that can crop up. Retorio, an AI hiring platform, was found to respond differently to the same candidate in different outfits, such as glasses and headscarves. And in a 2020 study from MIT, the Universitat Oberta de Catalunya in Barcelona, and the Universidad Autonoma de Madrid, researchers showed that algorithms could become biased toward certain facial expressions, like smiling, which could reduce their recognition accuracy.

A separate study by researchers at the University of Cambridge and Middle East Technical University found that at least one of the public datasets often used to train emotion AI systems contains far more Caucasian faces than Asian or Black faces. More recent research highlights the consequences, showing that that popular vendors' emotional analysis products assign more negative emotions to Black men's faces than white men's faces.

Voices, too, cover a broad range of characteristics, including those of people with disabilities, conditions like autism, and who speak in other languages and dialects such as African-American Vernacular English (AAVE). A native French speaker taking a survey in English might pause or pronounce a word with some uncertainty, which could be misconstrued by an AI system as an emotion marker.

Despite the technical flaws, some companies and governments are readily adopting emotion AI to make high-stakes decisions. Employers are using it to evaluate potential employees by scoring them on empathy or emotional intelligence. Schools have deployed it to monitor students’ engagement in the classroom -- and even while they do classwork at home. Emotion AI has also been used to identify "dangerous people" and tested at border control stops in the U.S., Hungary, Latvia, and Greece.

Training the algorithms

To mitigate bias, Hume says that it uses "randomized experiments" to gather "a rich array" of expressions -- facial and vocal -- from "people from a wide range of backgrounds." According to Cowen, the company has collected more than 1.1 million images and videos of facial expressions from over 30,000 different people in the U.S., China, Venezuela, India, South Africa, and Ethiopia, as well as more than 900,000 audio recordings from over 25,000 people voicing their emotions labeled with people’s self-reported emotional experiences.

Hume's dataset is smaller than Affectiva's, which Affectiva once claimed was the largest of its kind with more than 10 million people's expressions from 87 countries. But Cowen claims that Hume's data can be used to train models to measure "an extremely wide range of expressions," including over 28 distinct facial expressions and 25 distinct vocal expressions.

"As interest in accessing our empathic AI models has increased, we’ve been preparing to ramp up access to them at scale. Thus, we will be launching a developer platform which will provide API documentation and a playground to developers and researchers," Hume said. "We’re also collecting data and training models for social interaction and conversational data, body language, and multi-modal expressions which we anticipate will just expand use cases and our customer base."

Beyond Mursion, Hume says it's working with Hoomano, a startup developing software for "social robots" like Softbank Robotics' Pepper, to create digital assistants that deliver better recommendations by accounting for users' emotions. Hume also claims to have partnered with researchers at Mount Sinai and the University of California, San Francisco to see whether its models can pick up on symptoms of depression and schizophrenia "that no previous methods have been able to capture."

"A person’s emotions broadly influence their behavior, including what they are likely to attend to and click on. Consequently, AI technologies like search engines, social media algorithms, and recommendation systems are already forms of 'emotion AI.' There’s no avoiding it. So decision-makers need to worry about how these technologies are processing and responding to cues of our emotions and affecting their users’ well-being, unbeknownst to their developers." Cowen said. "Hume AI is providing the tools needed to ensure that technologies are designed to improve their users’ well-being. Without tools to measure cues to emotion, there’s no way of knowing how an AI system is processing these cues and affecting people’s emotions, and no hope of designing the system to do so in a manner that is consistent with people’s well-being."

Setting aside the fraught nature of AI to diagnose mental illness, Mike Cook, an AI researcher at Queen Mary University of London, says that the company's messaging feels "performative" and the discourse suspect. "[T]hey've clearly gone to great pains to talk about diversity and inclusion and stuff, and I'm not going to complain that people are making datasets with more geographic diversity. But it feels a bit like it was massaged by a PR agent who knew the recipe for making your company look like it cares," he said.

Cowen argues that Hume is more carefully considering the applications of emotion AI than competitors by establishing The Hume Initiative, a nonprofit "dedicated to regulation empathic AI." The Hume Initiative -- whose ethics committee includes Taniya Mishra, the former director of AI at Affectiva -- has released regulatory guidelines that Hume says it'll abide by in commercializing its technologies.

The Hume Initiative's guidelines, a draft of which was shared with VentureBeat, bans applications like manipulation, deception, "optimizing for reduced well-being," and "unbounded" emotion AI. It also lays out constraints for use cases like platforms and interfaces, health and development, and education, for example requiring educators to ensure that the output of an emotion AI model is used to give constructive -- but non-evaluative -- feedback.

Coauthors of the guidelines include Danielle Krettek Cobb, the founder of the Google Empathy Lab; Dacher Keltner, a professor of psychology at UC Berkeley; and Ben Bland, who chairs the IEEE committee developing standards for emotion AI.

"The Hume Initiative began by listing all of the known use cases for empathic AI. Then, they voted on the first concrete ethical guidelines. The resulting guidelines are unlike any previous approach to AI ethics in that they are concrete and enforceable. They detail the uses of empathic AI that strengthen humanity's greatest qualities of belonging, compassion, and well-being, and those that admit of unacceptable risks," Cowen said. "[T]hose using Hume AI’s data or AI models are required to commit to using them only in compliance with The Hume Initiative’s ethical guidelines, ensuring that any applications that incorporate our technology are designed to improve people’s well-being."

Reasons for skepticism

Recent history is filled with examples of companies touting their internal AI ethics efforts only to have those efforts fall by the wayside -- or prove to be performative and ineffectual. Google infamously dissolved its AI ethics board just one week after forming it. Reports have described Meta’s (formerly Facebook’s) AI ethics team, too, as largely toothless.

It's often referred to as "ethics washing." Put simply, ethics washing is the practice of fabricating or exaggerating a company’s interest in equitable AI systems that work for everyone. A textbook example for tech giants is when a company promotes "AI for good" initiatives with one hand while selling surveillance tech to governments and corporations with the other.

In a paper by Trilateral Research, a technology consultancy based in London, the coauthors argue that ethical principles and guidelines do not, by themselves, help practically explore challenging issues such as fairness in emotion AI. These need to be investigated in-depth, they say, to ensure that companies don't implement systems in opposition to society's norms and values. "Without a continuous process of questioning what is or may be obvious, of digging behind what seems to be settled, of keeping alive this interrogation, ethics is rendered ineffective," they wrote. "And thus, the settling of ethics into established norms and principles comes down to its termination."

Cook sees flaws in The Hume Initiative's guidelines as written, particularly in its use of nebulous language. "A lot of the guidelines feel performatively phrased -- if you believe manipulating the user is bad, then you'll see the guidelines and go, 'Yes, I won't do that.' And if you don't care, you'll read the guidelines and go, 'Yes, I can justify this,'" he said.

Cowen stands by the belief that Hume is "open[ing] the door to optimize AI for individual and societal well-being" rather than short-term business goals like user engagement. "We don’t have any true competitors because the other AI models available to measure cues of emotion are very limited. They focus on a very narrow range of facial expressions, completely ignore the voice, and have problematic demographic biases. These biases are woven into the data that AI systems are usually trained on. On top of that, no other company has concrete ethical guidelines for the use of empathic AI," he said. "We are creating a platform that centralizes the deployment of our models and offers users more control over how their data is used."

But guidelines or no, policymakers have already begun to curtail the use of emotion AI technologies. The New York City Council recently passed a rule requiring employers to inform candidates when they're being assessed by AI -- and to audit the algorithms every year. An Illinois law requires consent from candidates for analysis of video footage, and Maryland has banned the use of facial analysis altogether.

Some vendors have proactively stopped offering or placed guardrails around their emotion AI services. HireVue announced that it'd stop using visual analysis in its algorithms. And Microsoft, which initially claimed its sentiment-detecting Face API could detect expressions across cultures, now notes in a disclaimer that "facial expressions alone do not represent the internal states of people."

As for Hume, Cook's read is that The Hume Initiative "made some ethics documents so people don't worry about what [Hume is] doing."

"[Perhaps] the biggest issue I have is I can't tell what they're doing. The part that's public ... doesn't seem to have anything on it apart from some datasets they made," Cook said.