New AI technique targets Alexa's contextual understanding

Dialogue state tracking, or estimating and keeping tabs on a person's goals throughout a multiturn conversation, is one of the ways Alexa figures out what users want. By combining conversation history with the most recent command, Amazon's intelligent assistant can better map slot names -- the price of a hotel or its star rating, for example -- to slot values, or entities mentioned in a dialogue.

Alexa already performs dialogue state tracking pretty effectively, but a team of scientists at Amazon's R&D division think there's room for improvement. In a new paper ("Dialog State Tracking: A Neural Reading Comprehension Approach") scheduled to be presented at the International Speech Communication Association's Special Interest Group on Discourse and Dialogue, they propose an AI system that formulates dialogue state tracking as a classic question-answering problem. In other words, their machine learning model decides on the slot value for each slot name after reading a conversational passage.

The team reports that their technique yielded a 6.5% improvement in slot tracking accuracy over the previous state of the art in qualitative tests and that it had an accuracy of up to 96% per slot on a data set of development data.

The system in question comprises three models, the first of which predicts whether a slot name and slot value pair needs to be carried over from the previous turn or updated at the current turn. A separate slot type algorithm predicts the type of slot value from four values ("Yes," "No," "Don't care," and "Span") if the slot carryover model updates the slot name and slot value pair. As for the third and final model, it extracts the slot value span from the dialogue if the slot model classifies the type as "Span."

Machine learning methods such as these extract answers in reading comprehension-based question answering as spans of consecutive words, explained Alexa AI group applied scientist Shuyang Gao in a blog post. This obviates the need to calculate distributions over thousands or millions of values for each slot. Moreover, combining them with traditional state tracking frameworks further boosts slot tracking accuracy.

"Historically, research on dialogue state tracking has focused on methods that estimate distributions over all the possible values for a given slot. But modern task-oriented dialogue systems present problems of scale," wrote Gao. "Machine reading comprehension is an active research area that has made a lot of great process in recent years. By connecting it with dialogue state tracking, we can leverage reading comprehension-based approaches and develop robust new models for task-oriented dialogue systems."