Spotify open-sources Klio, a framework for AI audio research

This week at the 2020 International Society for Music Information Retrieval Conference, Spotify open-sourced Klio, an ecosystem that allows data scientists to process audio files (or any binary files) easily and at scale. It was built to run Spotify's large-scale audio intelligence systems and is leveraged by the company's engineers and audio scientists to help develop and deploy next-generation audio algorithms.

The Apache Beam-based Klio enables organizations to create media processing systems that share tooling and infrastructure between production systems and research teams. The platform's architecture encourages reusable jobs and shared outputs, ostensibly lowering maintenance and recomputation costs. Moreover, Klio supports continuous, event-driven processing of rapidly growing catalogs of media content, providing engineers a framework to productize processing jobs and organizations a way to process new content on ingestion.

"Klio is basically a way for folks to engage and build out smarter data pipelines for any type of media," Tyson Singer, VP of technology at Spotify, explained to VentureBeat in a phone interview. "It allows developers and researchers to work in media in a more efficient way."

According to Singer, Klio had its genesis in Spotify's accelerating shift to AI-based research. Over the years, the company has begun to tap natural language processing, audio models, and filtering to serve up recommendations and curate playlists including Discovery Weekly and Release Radar. Just last December in Japan, Spotify launched Sing Along, a karaoke-like feature that taps AI to separate vocals from an instrument track within minutes of a song joining the catalog. (For context, 40,000 songs per day are added to the Spotify database of over 60 million songs that are processed on a regular basis.)

"We were starting to hit some challenges and limits with our existing tooling," Singer said. "We were quite concerned because we were getting all this feedback from our researchers that they weren't very happy. They weren't able to be very productive, and it was taking way too long for them to have the impact that they wanted."

Work on Klio -- whose namesake is Clio, the Greek muse of history -- began in early 2019. A prototype came together by the fall, and later in the year, Klio was instrumental in launching a Spotify feature into production. Now, Spotify developers use Klio to string together pipelines that build upon internal work and take advantage of whole-audio-feature APIs.

"We have a lot of smart researchers and they're doing really awesome stuff with music information retrieval, where the machine actually hears the song rather than just human ears and tries to learn from it," Lynn Root, one of the engineers who spearheaded the Klio project, told VentureBeat. "With Klio, you can do a lot more audio processing and optimize it in a way that you don't necessarily have to repeat work. Klio can also build upon other research -- it provides a way for researchers to build upon existing work with good, clean datasets."

Klio is meant primarily for engineers and researchers as opposed to those without data science backgrounds -- currently, it takes 50 to 60 lines of code to get the core platform's capabilities into a project. But Root and Singer say that usability improvements are on the feature roadmap.

"Once Klio is integrated, you can leverage some capabilities in a very simple fashion, like identifying the beat and danceability factor of a song," Singer said. "That's very accessible for folks -- I would say maybe accessible to somebody who isn't an engineer and certainly accessible to, like, a product manager."