Amazon debuted a bunch of new and refreshed devices (11, to be exact) at its hardware event in Seattle earlier this year, but one of the coolest new features — Whisper Mode — is hitting Alexa devices this week. The company said that starting today, speakers and smart home appliances powered by Alexa, its virtual assistant, will respond to whispered speech by whispering back.
It works in U.S. English and is rolling out to users in the U.S, but it isn’t enabled by default. To switch it on, head to Settings > Alexa Account > Alexa Voice Responses > Whispered Responses in the Alexa companion app, or say, “Alexa, turn on whisper mode.”
Amazon and Alexa developers have been able to make Alexa whisper for some time now with SSML tags, but Whisper Mode is completely autonomous. In a blog post published earlier this month, Zeynab Raeesy, a speech scientist in Amazon’s Alexa Speech group, revealed its artificial intelligence (AI) underpinnings.
In essence, Whisper Mode uses a neural network — layers of mathematical functions loosely modeled after the human brain’s neurons — to distinguish between normal and whispered words. That’s more challenging than it sounds; whispered speech is predominantly unvoiced — that is to say, it doesn’t involve the vibration of the vocal cords — and tends to have less energy in lower frequency bands than ordinary speech.
“If you’re in a room where a child has just fallen asleep, and someone else walks in, you might start speaking in a whisper, to indicate that you’re trying to keep the room quiet. The other person will probably start whispering, too,” Raeesy wrote. “We would like Alexa to react to conversational cues in just such a natural, intuitive way.”
Whisper Mode isn’t Amazon’s first foray into AI-assisted voice analysis. In a briefing with reporters late last year, Alexa chief scientist Rohit Prasad said Amazon’s Alexa team was beginning to compare the sounds of users’ voices to recognize moods and emotional states.
“It’s early days for this, because detecting frustration and emotion on far-field audio is hard, plus there are human baselines you need to know to understand if I’m frustrated. Am I frustrated right now? You can’t tell unless you know me,” Prasad told VentureBeat. “With language, you can already express ‘Hey, Alexa play upbeat music’ or ‘Play dance music.’ Those we are able to handle from explicitly identifying the mood, but now where we want to get to is a more implicit place from your acoustic expressions of your mood.”
And its debut dovetails with another machine learning-powered feature introduced earlier this year: Hunches. With Hunches, Alexa can provide information based on what it knows about connected devices or sensors on a local network. For example, if you say “Alexa, good night,” in response, Alexa might say “By the way, your living room light is on. Do you want me to turn it off?”