Developing an AI system capable of understanding natural language isn’t just time-consuming — it’s really expensive. Developers have to collect thousands of voice samples and annotate them by hand, a process that often takes weeks. That’s why researchers at Amazon’s Alexa division pursued transfer learning, which leverages a neural network — i.e., layers of mathematical functions that mimic neurons in the brain — trained on a large dataset of previously annotated samples to bootstrap training in a new domain with sparse data.
In a newly published paper (“Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents”), Alexa AI scientists describe a technique that taps millions of unannotated interactions with Amazon’s voice assistant to reduce errors by 8 percent. They’ll present the fruit of their labor at the Association for the Advancement of Artificial Intelligence (AAAI) in Honolulu, Hawaii later this year.
These interactions were used to train an AI system to generate embeddings — numerical representations of words — such that words with similar functions were grouped closely together. As explained by Anuj Goyal, an applied scientist at Alexa AI and a coauthor on the study, embeddings tend to group words by their “co-occurrence” with other words — that is, how frequently they appear alongside each other in a particular order.
“The more co-occurring words two words have in common, the closer they are in the embedding space,” Goyal wrote in a blog post. “Embeddings thus capture information about words’ semantic similarities without requiring human annotation of training data.”
The embeddings are based on a scheme called Embeddings from Language Models, or ELMo, simplified to make it efficient enough for a real-time system like Alexa. Uniquely, the researchers’ variant is context-sensitive — a word like “bark” receives different embeddings in “the dog’s bark is loud” and “the tree’s bark is hard.”
In tests, Alexa researchers compared ELMo and their optimized version, dubbed ELMo Light (ELMoL), to a network that used no embedding scheme whatsoever. With both ELMo and ELMoL, they trained the embedding layers on 250 million unannotated requests to Alexa, and used another 4 million annotated requests to existing Alexa services to train all three networks on two standard natural language processing tasks. Specifically, the networks were tasked with (1) intent classification, or determining the action an Alexa customer wanted to perform, and (2) slot tagging, or figuring out to which entities the action should apply.
Once the networks had been trained, they were retrained on limited data to perform new tasks. The network that used the ELMo embeddings performed the best, with the ELMoL network coming in close second. (The aforementioned 8 percent error reduction was achieved with 100 to 500 training examples.)
“Those improvements were greatest when the volume of data for the final retraining — the transfer learning step — was small,” Goyal wrote. “But that is precisely the context in which transfer learning is most useful.”
Today’s news follows a technique that improves Alexa’s ability to understand multistep commands in one shot and comes months after Amazon scientists described an AI-driven method that can cut Alexa’s skill selection error rate by 40 percent.
The audio problem: Learn how new cloud-based API solutions are solving imperfect, frustrating audio in video conferences. Access here