Google's Tensorflow team open-sources speech recognition dataset for DIY AI

Google researchers open-sourced a dataset today to give DIY makers interested in artificial intelligence more tools to create basic voice commands for a range of smart devices. Created by the TensorFlow and AIY teams at Google, the Speech Commands dataset is a collection of 65,000 utterances of 30 words for the training and inference of AI models.

AIY Projects was launched in May to support do-it-yourself makers who want to tinker with AI. The initiative plans to launch a series of reference designs, and began with speech recognition and a smart speaker you can make in a cardboard box.

“The infrastructure we used to create the data has been open sourced too, and we hope to see it used by the wider community to create their own versions, especially to cover underserved languages and applications,” Google Brain software engineer Pete Warden wrote in a blog post today.

Warden said Google hopes more accents and variations are shared with the project over time to broaden the dataset beyond contributions made already by thousands of people. Unlike other datasets, you can actually add your voice to Speech Commands. Visit the speech portion of the AIY Projects website and you’ll be invited to contribute short recordings of 135 simple words like “bird,” “stop,” or “go,” as well as a series of numbers and names.

Some models trained using the Speech Commands dataset may not yet understand every user's voice, because some groups aren't well represented in voice samples gathered by the project thus far, Warden said.

A lack of local dialects or slang have been found to exclude certain groups of people when telling a device a voice command.

A study published last month by Stanford AI researchers found that a language identifier NLP named Equilid that was trained with things like Twitter and Urban Dictionary is more accurate than identifiers trained with text that can exclude some users based on age, race, or the way they naturally talk, Initial results found Equilid was more accurate than Google’s CLD2. Additional academic tests of speech recognition tools also found popular NLP tools struggled to understand African-American users.

More