How might AI assistants like Google Assistant better support new services without the need for additional data and retraining? That’s the question Google researchers sought to answer in a recent study, which introduces an approach that uses a model across services without domain-specific parameters. As a part of it, the team released a corpus — the Schema-Guided Dialogue (SGD) corpus — that they claim is the largest publicly available compilation of task-oriented dialogues.

“Today’s virtual assistants help users to accomplish a wide variety of tasks, including finding flights, searching for nearby events and movies, making reservations, sourcing information from the web and more,” wrote software engineer Abhinav Rastogi and Google Research engineering lead Pranav Khaitan in a blog post. “Despite tremendous progress … [adaptability challenges] have often been overlooked in state-of-the-art models. This is due, in part, to the absence of suitable datasets that match the scale and complexity confronted by such virtual assistants.”

To this end, SGD consists of over 18,000 annotated conversations between a human and a virtual assistant involving interactions with services spanning 17 domains, ranging from banks and events to media, calendar, travel, and weather. For most of the domains, the data set contains several different APIs, many of which have overlapping functionalities but different interfaces reflecting typical real-world scenarios. And the evaluation set comprises services that aren’t present in the training set, chiefly to quantify the robustness of the models to changes in APIs or to the addition of new APIs.

As for the aforementioned schema-guided approach, it taps natural language descriptions of each service or API and their associated attributes to learn a distributed semantic representation, which is given as an additional input to a dialogue system that’s subsequently implemented as a single model. The team says that the unified model — which is at the core of Google’s open source dialogue state tracking model — facilitates representation of common knowledge between similar concepts in different services, making it possible to operate over new services that aren’t present in the training data.

“We believe that this dataset will act as a good benchmark for building large-scale dialogue models,” wrote Rastogi and Khaitan. “We are excited and looking forward to all the innovative ways in which the research community will use it for the advancement of dialogue technologies.”

The release of the new data set and model come after the open-sourcing of Google’s Coached Conversational Preference Elicitation (CCPE) and Taskmaster-1, a pair of data sets of dialogue between two people. (The former includes 500 dialogues with people about their movie preferences — 10,000 in total, across 12,000 utterances.) Google described them as a step toward model natural language systems capable of achieving human-level performance.


The audio problem: Learn how new cloud-based API solutions are solving imperfect, frustrating audio in video conferences. Access here