Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.

Google researchers open-sourced a dataset today to give DIY makers interested in artificial intelligence more tools to create basic voice commands for a range of smart devices. Created by the TensorFlow and AIY teams at Google, the Speech Commands dataset is a collection of 65,000 utterances of 30 words for the training and inference of AI models.

AIY Projects was launched in May to support do-it-yourself makers who want to tinker with AI. The initiative plans to launch a series of reference designs, and began with speech recognition and a smart speaker you can make in a cardboard box.

“The infrastructure we used to create the data has been open sourced too, and we hope to see it used by the wider community to create their own versions, especially to cover underserved languages and applications,” Google Brain software engineer Pete Warden wrote in a blog post today.

Warden said Google hopes more accents and variations are shared with the project over time to broaden the dataset beyond contributions made already by thousands of people. Unlike other datasets, you can actually add your voice to Speech Commands. Visit the speech portion of the AIY Projects website and you’ll be invited to contribute short recordings of 135 simple words like “bird,” “stop,” or “go,” as well as a series of numbers and names.


MetaBeat 2022

MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.

Register Here

Some models trained using the Speech Commands dataset may not yet understand every user’s voice, because some groups aren’t well represented in voice samples gathered by the project thus far, Warden said.

A lack of local dialects or slang have been found to exclude certain groups of people when telling a device a voice command.

A study published last month by Stanford AI researchers found that a language identifier NLP named Equilid that was trained with things like Twitter and Urban Dictionary is more accurate than identifiers trained with text that can exclude some users based on age, race, or the way they naturally talk, Initial results found Equilid was more accurate than Google’s CLD2. Additional academic tests of speech recognition tools also found popular NLP tools struggled to understand African-American users.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.