I’m sure you’re happy to know that deep neural networks perform signiﬁcantly better than shallow networks. But you might be thrilled to learn that Microsoft has doubled the speed at which it can translate what you say into text … while also improving accuracy.
What do the deep neural networks (DNNs) have to do with it?
Microsoft says that DNNs, which function more like a human brain than a traditional computer, can detect tiny variations in speech that stay the same even when your voice changes. In other words, even when you speak faster or yell, or when you’re winded from running up the stairs, those variations remain stable. Even better, they remain stable from individual to individual as long as you’re speaking the same language.
The result is that a message that takes 1.06 seconds to render into text with Microsoft’s old technology now takes .53 seconds. That may not seem like much, but in the video below it feels almost instantaneous. To skip the theory and see the goods, fast-forward to the 1:08 mark:
The best part is that the error rate is now down as well — from 16 percent to 13.5 percent — and that the technology is resistant to background noise interference. All of which — as we use more and more voice-recognition technology to control our mobile devices, our gaming systems, and to dictate our texts and messages — is a great help.
The update is currently rolling out to Microsoft data centers in the U.S.
Image credit: Microsoft