We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!

Google today open-sourced Coached Conversational Preference Elicitation (CCPE) and Taskmaster-1, datasets of dialog between two people. Both datasets are being shared by Google AI researchers to supply the training material necessary to model natural language systems that achieve human-level performance.

Google researchers call CCPE a new way to collect voice data. It includes 500 dialogues with people about their movie preferences — 10,000 in total, across 12,000 utterances.

Movie preferences were chosen as a topic because of the value of metadata such as the names of actors and directors.

“We do not restrict the workers to detailed scripts or to a small knowledge base and hence we observe that our dataset contains more realistic and diverse conversations in comparison to existing datasets,” a paper published covering CCPE reads.

The Taskmaster-1 dataset is made of more than 13,200 dialogue samples. Both it and CCPE were made using the Wizard of Oz method, where one human plays the role of the agent and workers from temporary worker websites portray an average digital assistant user.

Taskmaster-1 contains dialogue across six categories: ordering pizza, creating auto repair appointments, setting up ride service, ordering movie tickets, ordering coffee drinks, and making restaurant reservations.

In other recent Google conversational AI news, Google’s Project Euphonia introduced conversational AI that improves recognition of the voices of people with accents and ALS, and Google DeepMind researchers worked with other AI community stakeholders to introduce the SuperGLUE benchmark for more robust conversational AI.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.