Emotion detection is a hot ask in marketing, but the tech just isn't ready yet

As the chief scientist of an AI and natural language processing company, I don’t just work with engineers and data scientists. I also talk with the heads of marketing departments daily. One of the questions I’ve been hearing a lot lately is, “How can we use AI to better understand consumers’ emotional states?”

This field of AI, generally referred to as emotion detection or recognition technology, or emotion analysis, is being put to use in this age of coronavirus. Just this month, researchers from the University College London published a paper purporting to accurately approximate emotional states of research participants using automated analysis of text responses to questions regarding the pandemic. Without getting into the specifics of that study, it underscores the continued growth of the technology and the new ways it is being applied.

It’s understandable that marketers would want to use emotion detection in their work. Marketing is about more than just touting speeds and feeds, features, and benefits. Marketing involves eliciting an emotional response from consumers in the hopes that the product, service, or brand will resonate deeply. If a marketer can get an accurate read on someone’s emotional response, they can tweak their campaigns and achieve better results in customer acquisition or retention.

Currently, marketers are using the technology primarily in research and focus group applications. For example, the makers of a certain breakfast cereal have used the technology to decide which video ads to run to its core audience. Participants in the study were shown several different versions, and the company chose the ad that elicited the most engaging responses as determined by the algorithms. You can imagine how this technology could be deployed across millions of camera-enabled PCs, gaming consoles, or TVs to track consumer reactions in a similar way. In the realm of text, a social media platform could start rewarding advertisers differently based on perceived emotional reactions of consumers as determined by the text they leave in the comments sections.

However, for emotion data to be useful, it needs to be accurate. And unfortunately, we are a long way from being able to get an accurate read of human emotion using algorithms, whether it’s in my field of specialization, text, or in emergent and increasingly controversial categories like video and image recognition.

Most of Big Tech, as well as a slew of startups, are marketing and selling capabilities in this area, and it has become a $20 billion market. With the rise of deep learning models, increasing computer processing power, the availability of large data sets of social content from companies like Twitter and Reddit, and the explosion in digital video, image, and audio content, the technology has reached the point over the past several years where accuracy levels are rising.

The problem is the technology just isn’t accurate enough, and the potential pitfalls outweigh the benefits. In the case of text data and written communication, we’ve all experienced on a personal level what this looks like: sometimes we message a loved one with a sarcastic note or joke that we thought was funny but that ends up being misunderstood and hurts the recipient’s feelings. Now imagine the complexities of having a machine interpret the writer’s intent.

Humans are better, although still far from perfect, at detecting emotion in a person when they can take in facial and vocal clues as well as the meaning of the spoken words and the context they’re expressed in. And with recent advances in deep learning, these modalities have become more amenable to machine learning.

Still, one only needs to look at real life to understand why it might be difficult to determine emotions even with these clues. There may be ethnic or cultural differences around how emotions are expressed, for example. Some research has found humans capable of making accurate judgments of emotion across cultures; however, they frequently involve actors or participants “making a specific face” on cue. These stereotyped facial expressions (like the lips curved down in a sad pout) are broadly understood, however, the evidence based on faces we make in practice is far less clear. The eyebrow lift of greeting or agreement in the Philippines, for example, might come across as skepticism or surprise in other cultures, or the figure-eight head wag common in India indicating agreement, might be perceived in other cultures as a head shaking “no.”

People express their internal emotions differently all the time, and these are nuances and context-based cues that many humans can’t even perceive, much less machine-learning algorithms.

In theory, a machine learning model could be trained on a very tightly defined context in a homogeneous population, but that’s unlikely to deliver a practical solution that usable in the real world. Emotion detection technology will likely deliver only a best guess that will require a human to make the final determination; and sometimes those humans will get it wrong too.

There have been two reports released over the past 10 months, each generating a slew of media coverage, which I think is a key reason I’m starting to get more questions now about emotion recognition. The first high-profile look was a study released last summer from Northeastern University that concluded, “Facial configurations … are not ‘fingerprints’ or diagnostic displays that reliably and specifically signal particular emotional states regardless of context, person, and culture. It is not possible to confidently infer happiness from a smile, anger from a scowl, or sadness from a frown, as much of current technology tries to do when applying what are mistakenly believed to be the scientific facts.”

The second was an annual report from the AI Now Institute at NYU released in December advising that emotion recognition be banned when used in making decisions that affect people’s everyday lives and access to opportunities.

A common refrain from data scientists is that you can make up for inaccuracies in measurement by processing more data in the hope that the mistakes will average out. This, however, is only true when the errors are random. If an emotion detecting algorithm works differently by race, gender, age group, education, or some other demographic factor, that difference will actually be magnified as more data is added.

While the implications of the error rates and potential for problematic biases are far more consequential in domains like law enforcement, hiring, and psychiatry than they are in marketing, marketers should be equally cautious about using this technology. Emotion expression is a complex category, whether it’s in written or visual communication. While we should and will continue to research and advance the field, we also need to be vigilant in the use and deployment of the technology, because we’re just not there yet.

Paul Barba is Chief Scientist at Lexalytics.

More