Hot on the heels of newly introduced Comprehend services and ahead of the AWS ReInvent summit in Las Vegas later this month, Amazon today announced that Amazon Transcribe, its automatic speech recognition (ASR) service, is gaining support for real-time transcriptions.
The live audio transcription feature is generally available this week and enables developers to pass streams to Transcribe and receive text transcripts in real time. As Paul Zhao, senior product manager at AWS’ machine learning division, and Paul Kohan, senior software engineer at Amazon Transcribe, explained in a blog post, it leverages data-transporting protocol HTTP/2 to transmit audio and transcripts between apps and Transcribe — specifically, HTTP/2’s bidirectional streams implementation, which lets apps send and receive data at the same time.
“Real-time transcriptions benefit use cases across diverse verticals, including contact centers, media and entertainment, courtroom record keeping, finance, and insurance,” Zhao and Kohan wrote. “In media, live broadcasting of news or shows can benefit from live subtitling. Video game companies can use streaming transcription to meet accessibility requirements for in-game chat, helping players who have hearing impairments. In the legal domain, courtrooms can leverage real-time transcriptions to enable stenography, while lawyers can also make legal annotations on top of live transcripts for deposition purposes. In business productivity, companies can leverage real-time transcription to capture meeting notes on the fly.”
Real-time transcription isn’t particularly novel — Google’s Cloud Speech-to-Text service, Twilio’s Speech Recognition API, and IBM’s Watson Speech to Text have supported it for the better part of years. But Transcribe’s solution results in “quicker” and “more reactive” results, Zhao and Kohan claim.
Amazon’s made an example application that demonstrates how the Amazon Web Services software development kit can be used to take advantage of real-time audio streaming. It’s available in open source on Github.
Amazon Transcribe launched publicly in April alongside Translate. It currently supports both 16 kHz and 8kHz audio streams; multiple audio encodings, such as WAV, MP3, MP4, and FLAC; and multiple languages, including U.S. English, Spanish, British English, Australian English, and Canadian French.
The prebuilt AI API sits within AWS’ suite of other AI services, among them Lex for natural language understanding, Polly for speech generation, and Rekognition for image processing.
Transcribe’s upgrades follow on the heels of AWS’ second set of high-security GovCloud datacenters in the U.S. and Amazon’s announcement that it plans to open datacenters in Italy in 2020. Earlier this month, AWS made Translate, Transcribe, and Comprehend services HIPAA-eligible.