Amazon's Alexa learns how to recognize irrelevant questions

Distinguishing between the relevant and irrelevant bits of a conversation is a good life skill in general, but for voice assistants like Amazon's Alexa, it's indispensable. In order to respond appropriately to what's being said -- about anything from the weather to a nearby restaurant or a package in transit -- they need to know whether the subject at hand is beyond their knowledge scope.

Researchers at Amazon tackled the problem with a natural language understanding (NLU) system that simultaneously recognizes in-domain (known) and out-of-domain (unknown) topics. The results will be presented at this year's Interspeech conference in Hyderabad, India in early September.

"Sometimes ... an Alexa customer might say something that doesn’t fit into any domain," Yong-Bum Kim, a scientist within Amazon's Alexa team and a lead author on the paper, wrote in a blog post. "It may be an honest request for a service that doesn’t exist yet, or it might be a case of the customer’s thinking out loud: 'Oh wait, that’s not what I wanted.' If a natural-language-understanding (NLU) system tries to assign a domain to an out-of-domain utterance, the result is likely to be a nonsensical response."

The team began by assembling two datasets comprising utterances (i.e., voice commands): one covering 21 different domains and the other sampled from 1,500 frequently used Alexa skills.

When it came to choosing a model, they settled on a bidirectional long short-term memory (Bi-LSTM) architecture that (1) factored in the order in which the utterances were received and (2) considered the data sequences both forward and backward. They fed it both "word-level" and "character-level" information -- specifically embeddings, or points in a 100-dimensional space that represent words -- and the words' constituent characters

The neural network produced a vector summary of useful individual character features, which the team combined with the aforementioned embeddings before passing them to a second Bi-LSTM. This one learned to recognize the summary of the entire input.

On average, the researchers' system improved classification accuracy by 6 percent for a given target. And they achieved dramatically better results when they trained the system on the 21-domain dataset: 90.4 percent accuracy compared to the existing system's 83.7 percent.

"By using a training mechanism that iteratively attempts to optimize the trade-off between those two goals, we significantly improve on the performance of a system that features a separately trained domain classifier and out-of-domain classifier," Kim wrote. "[The] domain classification makes ... determinations [such as the actions that a customer wants executed] much more efficient ... by narrowing the range of possible interpretations."

More