AI improves Alexa's error rate with challenging training sets

Machine learning algorithms at the core of voice assistants learn to make predictions from libraries of labeled samples. For instance, Amazon's Alexa is regularly fed text snippets like "Play the Prince song 1999," where “1999” and "Prince" are mapped to the categories "SongName" and "ArtistName," respectively. It's a highly effective means of driving systems to classify data on their own, but it's not exactly easy -- annotation is a painstaking process that must be undertaken by hand.

That's why researchers at Amazon's Alexa AI division devised an "active learning" approach that selects which training examples to annotate automatically, based on the likelihood they'll yield a reduction in Alexa's error rate. They claim that in experiments, it boosted the accuracy of AI models by 7% to 9% relative to training on randomly selected examples.

"The goal of active learning is to canvass as many candidate examples as possible to find those with the most informational value," wrote Stan Peshterliev, a senior applied scientist at Amazon. "Consequently, the selection mechanism must be efficient [so that regularly] retraining Alexa's models on [the selected samples] improves their performance."

Peshterliev explains that active learning traditionally taps linear classifiers, which assign weights learned from training to words in example sentences to identify those with the most informational value. The sum of the weights yields an overall score, and a score greater than zero indicates that the corresponding sentence belongs to a particular category (e.g., "ArtistName" or "SongName"). Examples are set aside for annotation if they receive scores close to zero, which implies that they were difficult to classify and that, by extension, they're likely to benefit AI models the most.

Peshterliev and colleagues adopted a "committee-based" active learning method that selected low-scoring examples and added another criterion: At least one of the machine learning models had to disagree with the others in its classification. To rerank the selected sentences, they tested a conditional-random-field (CRF) model that classified individual words as belonging to categories such that easily classified words increased the aforementioned scores and difficult-to-classify words decreased them.

The researchers say that the addition of CRF further shrank error rate by 1% to 2% and that it resulted in a 1% to 3.5% improvement compared with the best-performing models previously reported.

They describe the full extent of their work in a paper ("Active Learning for New Domains in Natural Language Understanding") that was presented last week at the annual meeting of the North American Chapter of the Association for Computational Linguistics.

More