Researchers identify dozens of words that accidentally trigger Amazon Echo speakers

As voice assistants like Google Assistant and Alexa increasingly make their way into internet of things devices, it's becoming harder to track when audio recordings are sent to the cloud and who might gain access to them. To spot transgressions, researchers at the University of Darmstadt, North Carolina State University, and the University of Paris Saclay developed LeakyPick, a platform that periodically probes microphone-equipped devices and monitors subsequent network traffic for patterns indicating audio transmission. They say LeakyPick identified "dozens" of words that accidentally trigger Amazon Echo speakers.

Voice assistant usage might be on the rise -- Statista estimated there were an estimated 4.25 billion assistants being used in devices around the world as of 2019 -- but privacy concerns haven't abated. Reporting has revealed that accidental activations have exposed contract workers to private conversations. The risk is such that law firms including Mischon de Reya have advised staff to mute smart speakers when they talk about client matters at home.

LeakyPick is designed to identify hidden voice audio recordings and transmissions as well as to detect potentially compromised devices. The researchers' prototype, which was built on a Raspberry Pi for less than $40, operates by periodically generating audible noises when a user isn't home and monitoring traffic using a statistical approach that's applicable to a range of voice-enabled devices.

LeakyPick -- which the researchers claim is 94% accurate at detecting speech traffic -- works for both devices that use a wakeword and those that don't, like security cameras and smoke alarms. In the case of the former, it's preconfigured to prefix probes with known wakewords and noises (e.g., "Alexa," "Hey Google"), and on the network level, it looks for "bursting," where microphone-enabled devices that don't typically send much data cause increased network traffic. A statistical probing step serves to filter out cases where bursts result from non-audio transmissions.

To identify words that might mistakenly trigger a voice recording, LeakyPick uses all words in a phoneme dictionary with the same or similar phoneme count compared with actual wakewords. (Phonemes are the perceptually distinct units of sound in a language that distinguish one word from another, such as p, b, d, and t in the English words pad, pat, bad, and bat.) LeakyPick also verbalizes random words from a simple English word list.

The researchers tested LeakyPick with an Echo Dot, a Google Home, a HomePod, a Netatmo Welcome and Presence, a Nest Protect, and a Hive Hub 360, using a Hive View to evaluate its performance. After creating baseline burst and statistical probing data sets, they monitored the eight devices' live traffic and randomly selected a set of 50 words out of the 1,000 most-used words in the English language combined with a list of known wakewords of voice-activated devices. Then they had users in three households interact with the three smart speakers -- the Echo Dot, HomePod, and Google Home -- over a period of 52 days.

The team measured LeakyPick's accuracy by recording timestamps of when the devices began listening for commands, taking advantage of indicators like the LED ring around the Echo Dot. A light sensor enabled LeakyPick to mark each time the devices were activated, while a 3-watt speaker connected to the Pi via an amplifier generated sound and a Wi-Fi USB dongle captured network traffic.

In one experiment intended to test LeakyPick's ability to identify unknown wakewords, the researchers configured the Echo Dot to use the standard "Alexa" wakeword and had LeakyPick play different audio inputs, waiting for two seconds to ensure the smart speaker "heard" the input. According to the researchers, the Echo Dot "reliably" reacted to 89 words across multiple rounds of testing, some of which were phonetically very different than "Alexa," like "alachah," "lechner," and "electrotelegraphic."

An Amazon spokesperson said the company is in the process of reviewing the research. "Unfortunately, we were not given the opportunity to review the methodology behind this study to validate the accuracy of these claims prior to publication," they told VentureBeat via email. "However, we can assure you that we have built privacy deeply into the Alexa service, and our devices are designed to wake up only after detecting the wake word. Customers talk to Alexa billions of times a month and in rare cases devices may wake up after hearing a word that sounds like 'Alexa' or one of the other available wake words. By design, our wake word detection and speech recognition get better every day -- as customers use their devices, we optimize performance."

All 89 words streamed audio recordings to Amazon -- findings that aren't surprising in light of another study identifying 1,000 phrases that incorrectly trigger Alexa-, Siri-, and Google Assistant-powered devices. The coauthors of that paper, which has yet to be published, told Ars Technica the devices in some cases send the audio to remote servers where "more robust" checking mechanisms also mistake the words for wakewords.

"As smart home IoT devices increasingly adopt microphones, there is a growing need for practical privacy defenses," the LeakyPick creators wrote. "LeakyPick represents a promising approach to mitigate a real threat to smart home privacy."

More