Trained neural networks at Microsoft are now as good at recognizing the human voice as humans are, researchers announced today.
In a report released Monday, researchers pitted Microsoft’s NIST 2000 automated system against professional transcriptionists and found for the first time a higher error rate among humans than computers.
“This marks the first time that human parity has been reported for conversational speech,” said the report published Monday.
Better speech recognition could impact a range of Microsoft products in the future.
“The milestone will have broad implications for consumer and business products that can be significantly augmented by speech recognition. That includes consumer entertainment devices like the Xbox, accessibility tools such as instant speech-to-text transcription, and personal digital assistants such as Cortana,” Microsoft said in a blog post published today about the achievement.
Microsoft CEO Satya Nadella has declared that conversation will be as influential to computing as the graphic user interface (GUI) and that conversation will enter all computing in the near future.
Other advancements have been made recently in a decades-long quest for computer recognition of the human voice.
Earlier this year, in conjunction with University of Washington and Stanford University, Baidu Research announced that the Baidu program Deep Speech 2 was able to transcribe speech three times faster than humans can type with their fingers.
Many early advancements in the field of conversational speech recognition come from DARPA, a research arm of the Department of Defense that’s been active in the field of speech recognition research since the 1970s.