The AI underlying assistants like Alexa gets better in part through manual data transcription and annotation, which takes outsized time and effort. In pursuit of a more scalable approach, scientists at Amazon — noting that people tend to reformulate misinterpreted commands — leveraged feedback from interactions to glean insights. In a paper detailing their work, they say that the automated self-learning system they deployed reduced errors across “millions” of Alexa customers.
It’s yet another step for Amazon along the way to a largely unsupervised and more human-like Alexa, as scientists and product managers from the company told VentureBeat in September. Such techniques have imbued Alexa with better contextual understanding of its surroundings with respect to smart home devices, as well as the ability to detect emotions like frustration in users’ voices.
As the researchers note, assistants like Alexa are far from perfect. Errors arise from automatic speech recognition (ASR), where an utterance like “Play Imagine Dragons” could be misinterpreted as “Play maj and dragons.” Natural language understanding errors include examples like “Don’t play this song again, skip,” which Alexa would understand only if it was phrased “Thumbs down this song.” And then there are comprehension issues, like “Play Bazzi Angel” rather than “Play Beautiful by Bazzi.” Tackling theses challenges required developing a “query rewriting” technique that reformulates voice commands to convey the same meaning.
At a high level, Alexa comprises three components: an ASR system, an NLU system with a built-in dialog manager, and a text-to-speech TTS system. Alexa recognizes a user’s voice by ASR and decodes it into plain text (an utterance), which the NLU module interprets (accounting for the state of the user’s active dialog session) and passes on with the corresponding action to execute to the TTS. The TTS generates the appropriate response as speech back to the user via Alexa, closing the interaction loop.
The researchers’ self-learning system intercepts utterances being passed on to the NLU component and rewrites them with a reformulation engine. (This engine draws on a high-performance, low-latency database that’s queried with the original utterance to yield its corresponding rewrite candidate.) The rewrite is then passed back to the NLU component for interpretation, restoring the original data flow.
According to the research team, the engine ingests anonymized Alexa log data from “millions” of customers on a daily basis to learn from users’ reformulations and updates the database, enabling it to maintain the viability of existing rewrites. (Automated jobs mine the “thousands” of new utterances per day to identify the most recent and serve them to users.) An offline blacklisting mechanism evaluates rewrites by independently comparing their friction rate against that of the original utterance, where friction is detected using a pretrained AI model.
Both explicit and implicit feedback informs the system. Explicit feedback here refers to corrective or reinforcing feedback from direct user engagement, the researchers say, principally events where users opt to interrupt Alexa with an interjection. Implicit feedback includes when users abandon a session following Alexa’s failure to handle a request due to an exception or some other error.
Amazon says that in the nine months since the system was deployed in production, it has led to a 30% reduction in defect rate. “We have been running this application for over nine months in production, and it has been serving millions of users since, improving their experience on a daily basis without getting in their way,” wrote the paper’s authors.