Baidu, the Chinese company operating a search engine, a mobile browser, and other web services, is announcing today the launch of SwiftScribe, a web app that’s meant to help people transcribe audio recordings more quickly, using — you guessed it! — artificial intelligence (AI).
Baidu in the past few years has been honing its DeepSpeech software for speech recognition. Last year, the company introduced TalkType, an Android keyboard that, using DeepSpeech, puts speech input first and typing second, based on the idea that you can enter information more quickly when you say it than when you peck. Now Baidu is coming out with another app enhanced with DeepSpeech, one that could arguably find better footing in a professional setting.
Amazon, Apple, Google, and Microsoft have all been working on speech recognition right alongside Baidu, but none of those four has come up with something aimed at longer-form transcription.
In SwiftScribe, once you choose a file to upload in .wav or .mp3 format, the system goes to work processing it. For me, a 30-second file was ready in 10 seconds, and a one-minute file was ready in less than 30 seconds. SwiftScribe can handle up to an hour of audio in any given file, but that will take 20 minutes to process, Baidu project manager Tian Wu told VentureBeat in an interview.
From there, you’ll need to go in and change some things, like capitalizing, adding punctuation, and changing the spelling of certain words. Keyboard shortcuts help you more efficiently change the speed of audio, rewind, and add a line break.
SwiftScribe was inspired partly by Wu’s experience transcribing many interviews during her time in graduate school at the University of California, Santa Barbara.
“English is not my first language,” said Wu, who is from China. “It took 10 hours to transcribe one hour of audio. That’s my personal experience. Usually it will take a professional four to six hours to transcribe a one-hour audio clip.”
But Wu and her colleague Nina Wei also took inspiration from conversations with several transcriptionists. Wu’s team believes SwiftScribe can help people transcribe audio 1.67 times faster — in 40 percent less time — than they would on their own. That would imply that they could do more work and ultimately get paid more for their work, Wu said.
While the product is certainly designed for transcriptionists — who are used to working on computers as opposed to mobile devices, hence the fact that SwiftScribe is only available as a web app — SwiftScribe could also come in handy for other people, like journalists and historians.
Today, Baidu is providing SwiftScribe as a free service — unlike Nuance’s Dragon software. “But in the future, we hope to turn it into a business,” Wu said.
In the future the team could enhance the app with video transcription and captioning, support for more file formats, and an option for automatically adding punctuation, she said.
A blog post has more detail.