Did you miss a session from GamesBeat Summit 2022? All sessions are available to stream now. Watch now.
Amazon’s Alexa team is beginning to analyze the sound of users’ voices to recognize their mood or emotional state, Alexa chief scientist Rohit Prasad told VentureBeat. Doing so could let Amazon personalize and improve customer experiences, lead to lengthier conversations with the AI assistant, and even open the door to Alexa one day responding to queries based on your emotional state or scanning voice recordings to diagnose disease.
Tell Alexa that you’re happy or sad today and she can deliver a pre-programmed response. In the future, Alexa may be able to pick up your mood without being told. The voice analysis effort will begin by teaching Alexa to recognize when a user is frustrated.
“It’s early days for this, because detecting frustration and emotion on far-field audio is hard, plus there are human baselines you need to know to understand if I’m frustrated. Am I frustrated right now? You can’t tell unless you know me,” Prasad told VentureBeat in a gathering with reporters last month. “With language, you can already express ‘Hey, Alexa play upbeat music’ or ‘Play dance music.’ Those we are able to handle from explicitly identifying the mood, but now where we want to get to is a more implicit place from your acoustic expressions of your mood.”
An Amazon spokesperson declined to comment on the kinds of moods or emotions Amazon may attempt to detect beyond frustration, and declined to share a timeline for when Amazon may seek to expand its deployment of sentiment analysis.
An anonymous source speaking with MIT Tech Review last year shared some details of Amazon’s plans to track frustration and other emotions, calling it a key area of research and development that Amazon is pursuing as a way to stay ahead of competitors like Google Assistant and Apple’s Siri.
Amazon’s Echo devices record an audio file of every interaction after the microphone hears the “Alexa” wake word. Each of these interactions can be used to create a baseline of your voice. Today, these recordings are used to improve Alexa’s natural language understanding and ability to recognize your voice.
To deliver personalized results, Alexa can also take into consideration things like your taste in music, zip code, or favorite sports teams.
Emotion detection company Affectiva is able to detect things like laughter, anger, and arousal from the sound of a person’s voice. It offers its services to several Fortune 1000 business, as well as the makers of social robots and AI assistants. Mood tracking will change the way robots and AI assistants like Alexa interact with humans, Affectiva CEO Rana el Kaliouby told VentureBeat in a phone interview.
Emotional intelligence is key to allowing devices with a voice interface to react to user responses and have a meaningful conversation, el Kaliouby said. Today, for example, Alexa can tell you a joke, but she can’t react based on whether you laughed at the joke.
“There’s a lot of ways these things [conversational agents] can persuade you to lead more productive, healthier, happier lives. But in my opinion, they can’t get there unless they have empathy, and unless they can factor in the considerations of your social, emotional, and cognitive state. And you can’t do that without affective computing, or what we call ‘artificial emotional intelligence’,” she said.
Personalization AI is currently at the heart of many modern tech services, like the listings you see on Airbnb or matches recommended to you on Tinder, and it’s an increasing part of the Alexa experience.
Voice signatures for recognizing up to 10 distinct user voices in a household and Routines for customized commands and scheduled actions both made their debut in October. Developers will be given access to voice signature functionality for more personalization in early 2018, Amazon announced at AWS re:Invent last month.
Emotional intelligence for longer conversations
Today, Alexa is limited in her ability to engage in conversations. No matter the subject, most interactions seem to last just a few seconds after she recognizes your intent.
To learn how to improve the AI assistant’s ability to carry out the back and forth volley that humans call conversation, Amazon last year created the Alexa Prize to challenge university teams to make bots that can maintain a conversation for 20 minutes. To speak with one of three 2017 Alexa Prize finalist bots, say “Alexa, let’s chat.”
Since the command was added in May, finalists have racked up more than 40,000 hours of conversation.
These finalists had access to conversation text transcripts for analysis, but not voice recordings. Amazon is considering giving text transcripts to all developers in the future, according to a report from The Information.
In addition to handing out $2.5 million in prize money, Amazon published the findings of more than a dozen social bots on the Alexa Prize website. Applications for the 2018 Alexa Prize are due January 8.
In September, while taking part in a panel titled “Say ‘Hello’ to your new AI family member,” Alexa senior manager Ashwin Ram suggested that someday Alexa could help combat loneliness, an affliction that is considered a growing public health risk.
In response to a question about the kind of bots he wants to see built, Ram said, “I think that the app that I would want is an app that takes these things from being assistants to being magnanimous, being things we can talk to, and you imagine it’s not just sort of a fun thing to have around the house, but for a lot of people that would be a lifesaver.” He also noted: “The biggest problem that senior citizens have, the biggest health problem, is loneliness, which leads to all kinds of health problems. Imagine having someone in the house to talk to — there’s plenty of other use cases like that you can imagine — so I would want a conversationalist.”
The Turing Test to determine whether a bot is able to convince a human they’re speaking to another human was not used to judge finalists of the Alexa Prize, Ram said, because people already know Alexa isn’t a human but still attempt to have conversations with her about any number of topics.
“We deliberately did not choose the Turing test as the criteria because it’s not about trying to figure out if this thing is human or not. It’s about building a really interesting conversation, and I imagine that as these things become intelligent, we’ll not think of them as human, but [we’ll] find them interesting anyway.”
Microsoft Cortana lead Jordi Ribas, who also took part in the panel, agreed with Ram, saying that for the millions of people who speak with Microsoft-made bots every month, the Turing Test moment has already passed, or users simply don’t care that they’re speaking to a machine.
Voice analysis for health care
While the idea of making Alexa a digital member of your family or giving Amazon the ability to detect loneliness may concern a lot of people, Alexa is already working to respond when users choose to share their emotional state. Working with a number of mental health organizations, Amazon has created responses for various mental health emergencies.
Alexa can’t make 911 calls (yet) but if someone tells Alexa that they want to commit suicide, she will suggest they call the National Suicide Prevention Lifeline. If they say they are depressed, Alexa will share suggestions and another 1-800 number. If Alexa is trained to recognize your voice signature baseline, she could be more proactive in these situations and speak up when you don’t sound well or you deviate from your baseline.
AI assistants like Alexa have sparked a fair number of privacy concerns, but these assistants promise interesting benefits, as well. Smart speakers analyzing the sound of your voice may be able to detect not just emotion but unique biomarkers associated with specific diseases.
A collection of researchers, startups, and medical professionals are entering the voice analysis field, as voice is thought to have unique biomarkers for conditions like traumatic brain injury, cardiovascular disease, depression, dementia, and Parkinson’s Disease.
The U.S. government today uses tone detection tech from Cogito to not only train West Point cadets in negotiation, but to determine the emotional state of active duty service members or veterans with PTSD.
Based in Israel, emotion detection startup Beyond Verbal is currently doing research with the Mayo Clinic to identify heart disease from the sound of someone’s voice. Last year, Beyond Verbal launched a research platform to collect voice samples of people with afflictions thought to be detectable through voice, such as Parkinson’s and ALS.
After being approached by pharmaceutical companies, Affectiva has also considered venturing into the health care industry. CEO Rana El Kaliouby thinks emotionally intelligent AI assistants or robots could be used to detect disease and reinforce healthy behavior but says there’s still a fair amount of work to be done to make this possible. She imagines the day when an AI assistant could help keep an eye on her teenage daughter.
“If she gets a personal assistant, when she’s 30 years old that assistant will know her really well, and it will have a ton of data about my daughter. It could know her baseline and should be able to flag if Jana’s feeling really down or fatigued or stressed. And I imagine there’s good to be had from leveraging that data to flag mental health problems early on and get the right support for people.”
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.