What might be the key to chatbots or voice-enabled assistants that respond in more natural, humanlike ways? Researchers at Rasa, a Berlin-based startup developing a standard infrastructure layer for conversational AI, believe selective attention might play an outsized role. In a preprint paper published this week on Arxiv.org, they detail a system that can selectively ignore or attend to dialogue history, enabling it to skip over responses in turns of dialogue that don’t directly address the previous utterance.
“Conversational AI assistants promise to help users achieve a task through natural language. Interpreting simple instructions like please turn on the lights is relatively straightforward, but to handle more complex tasks these systems must be able to engage in multi-turn conversations,” wrote the coauthors. “Each utterance in a conversation does not necessarily have to be a response to the most recent utterance by the other party.”
The team proposes what they call the Transformer Embedding Dialogue (TED) policy, which chooses which diaogue turns to skip with the help of transformers. For the uninitiated, Transformers are a novel type of neural architecture introduced in a 2017 paper coauthored by scientists at Google Brain, Google’s AI research division. As do all deep neural networks, they contain neurons (mathematical functions) arranged in interconnected layers that transmit signals from input data and slowly adjust the synaptic strength (weights) of each connection. That’s how all AI models extract features and learn to make predictions, but Transformers uniquely have attention such that every output element is connected to every input element. The weightings between them are calculated dynamically, effectively.
Importantly, the researchers say that the TED policy — which can be used in either a modular or end-to-end fashion — doesn’t assume any given whole dialogue sequence is relevant for choosing an answer to an utterance. Instead, it selects on the fly which historical turns are relevant, which helps it to better recover from non-sequiturs and other unexpected inputs.
In a series of experiments, the team sourced a freely available data set (MultiWOZ) containing 10,438 human-human dialogues for tasks in seven different domains: hotel, restaurant, train, taxi, attraction, hospital, and police. After training the model on 740 dialogues and compiling a corpus of 185 for testing, they conducted a detailed analysis. Although the data set wasn’t ideal for supervised learning of dialogue policies, due in part to its lack of historical dependence, the researchers report that the model successfully recovered from “non-cooperative” user behavior and outperformed baseline approaches at every dialogue turn (excepting a few mistakes).
Rasa hasn’t yet incorporated the model into production systems, but it could bolster its suite of conversational AI tools — Rasa Stack — targeting verticals like sales and marketing and advanced customer service in health care, insurance, telecom, banking, and other enterprise verticals. Adobe recently used Rasa’s tools to build an AI assistant that enables users to search through Adobe Stock using natural language commands. And Rasa says that “thousands” of developers have downloaded Rasa Stack over half a million times.