Connect with top gaming leaders in Los Angeles at GamesBeat Summit 2023 this May 22-23. Register here.
Google today detailed SoundStream, an end-to-end “neural” audio codec that can provide higher-quality audio while encoding different sound types, including clean speech, noisy and reverberant speech, music, and environmental sounds. The company claims this is the first AI-powered codec to work on speech and music while being able to run in real time on a smartphone processor at the same time.
Audio codecs compress audio to reduce the need for high storage and bandwidth requirements. Ideally, the decoded audio should be perceptually indistinguishable from the original and introduce little latency. While most codecs leverage domain expertise and carefully engineered signal processing pipelines, there’s been interest in replacing handcrafted specs with AI that can learn to encode on the fly.
Earlier this year, Google released Lyra, a neural audio codec trained to compress low-bitrate speech. SoundStream extends this work with a system consisting of an encoder, decoder, and quantizer. The encoder converts audio into a coded signal that’s compressed using the quantizer and converted back to audio using the decoder. Once trained, the encoder and decoder can be run on separate clients to transmit audio over the internet, and the decoder can operate at any bitrate.
Compressing audio
In traditional audio processing pipelines, compression and enhancement — i.e., the removal of background noise — are typically performed by different modules. But SoundStream is designed to carry out compression and enhancement at the same time. At 3kbps, SoundStream outperforms the popular Opus codec at 12kbps and approaches the quality of EVS at 9.6kbps while using 3.2-4 times fewer bits, Google claims. Moreover, SoundStream performs better than the current version of Lyra when compared at the same bitrate.
Event
Transform 2023
Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.
Here’s reference audio before processing with SoundStream:
And here’s the audio after processing:
Google cautions that SoundStream is still in the experimental stages. However, the company plans to release an updated version of Lyra that incorporates its components to deliver both higher audio quality and “reduced complexity.”
“Efficient compression is necessary whenever one needs to transmit audio, whether when streaming a video or during a conference call. SoundStream is an important step toward improving machine learning-driven audio codecs. It outperforms state-of-the-art codecs, such as Opus and EVS, can enhance audio on demand, and requires deployment of only a single scalable model, rather than many,” Google research scientist Neil Zeghidour and staff research Marco Tagliasacchi wrote in a blog post. “By integrating SoundStream with Lyra, developers can leverage the existing Lyra APIs and tools for their work, providing both flexibility and better sound quality.”
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.