Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.

How does Amazon’s Alexa assistant field complex commands like “Alexa, add peanut butter and milk to the shopping list and play music”? With a few sophisticated algorithmic techniques, as it turns out. In a newly published paper (“Practical Semantic Parsing for Spoken Language Understanding“) and accompanying blog post, scientists at Amazon’s Alexa AI research division detail an AI system capable of extracting both the structure and meaning of a sentence, even when the meaning and structure are complex or somewhat ambiguous.

As paper coauthor Rahul Goel explains, the model’s design was informed by two machine learning techniques: transfer learning, which transfers knowledge from an existing AI system to reduce the amount of data required to train a new model, and a copying mechanism, which enables models to deal with data they haven’t seen before.

Traditionally, Alexa parses requests by their intents (e.g., PlayMusic, SongName, and ArtistName) and slots ( Marvin Gaye’s “What’s Going On?”). But this approach necessitates a lot of error-prone manual annotation. For instance, the request “Add apples and oranges to shopping list and play music” consists of two main clauses (“add apples and oranges to shopping list” and “play music”) joined by the conjunction “and,” which is encoded in a data set as “(and(addToListIntent(add(ItemName(Apples))(ItemName(Oranges))))(PlayMusicIntent(Mediatype(Music)))).”

Amazon Alexa AI

Above: A parse tree for the request “Which cinemas screen Star Wars tonight?”

Image Credit: Amazon

The researchers chose instead to automatically convert data labeled according to their intents and slots into parse trees, or decision trees that depict requests’ grammatical structures. The team’s semantic parser constructed trees through a series of shift and reduce operations, where a “shift” moved to the next word in the input and a “reduce” assigned its final position in the tree. All the while, an attention mechanism tracked data examined by the parser and determined whether to use words from a lexicon or copy over words from the input stream.


MetaBeat 2022

MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.

Register Here

In tests with natural-language understanding (NLU) data from Alexa interactions, the copy mechanism alone increased accuracy by an average of 61%, the researchers report, while transfer learning further improved it by 6.4%. And in a separate set of question-answering tests that drew on two public data sets (with questions like “What restaurant can you eat outside at?” or “How many steals did Kobe Bryant have in 2004?”), transfer learning boosted performance by 10.8%.

“The fact that our semantic parser improves performance on both natural-language-understanding and question-answering tasks indicates its promise as a general-purpose technique for representing meaning, which could have other applications, as well,” wrote Rahul Goel.

The work is scheduled to be presented at the 16th annual North American Chapter of the Association for Computational Linguistics in New Orleans, Louisiana in June.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.