Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this January 12, 2022. Learn more

AI might soon become an invaluable tool in musicians’ compositional arsenals, if recent developments are any indication. In July, Montreal-based startup Landr raised $26 million for a product that analyzes musical styles to create bespoke sets of audio processors, while OpenAI and Google earlier this year debuted online creation tools that tap music-generating algorithms.

Inspired by this and other recent work, researchers at Sony investigated a machine learning model for conditional kick-drum track generation. Given an existing song and low-dimensional code that encodes the relationship between said song and to-be-generated new material, the AI creates a variety of “musically plausible” drum patterns from one song to another irrespective of differences in tempo and time-shift (i.e., changing speed or duration).

“We propose a model architecture … that encodes rhythmic interactions of the kick drum versus bass and snare patterns. Each mapping code captures local relations between kick vs bass and snare inputs, such that an entire track is associated to a sequence of mapping codes,” explained the coauthors. “Rather than controlling the characteristics of the generated material directly, it offers control over how the generated material relates to the conditioning material.”

To train the AI system, the researchers compiled a data set consisting of 665 pop, rock, and electronic songs where the rhythm instruments bass, kick, and snare were available as separate 44.1kHz audio tracks. (Contextual signals consisted of two input maps for beat and downbeat possibilities as well as maps for the onset functions of snare and bass.) Next, they rendered an audio file of a drum kick by placing a drum sample on all amplitude peaks remaining after thresholding, to which they introduced dynamics by choosing the volume of the sample from 70% for peaks at the threshold to 100% for peaks with the maximum value.

In a series of experiments, they tapped the AI system to both conditionally generate drum patterns and transfer style, or apply rhythm patterns inferred from one song to induce similar patterns in another song. Additionally, they created time-stretched versions of songs at 80%, 90%, 110%, and 120% of the original tempo, respectively, and determined a mapping code.

Here’s one song pre-processing (Gypsy Love):

And here’s that same song with AI-generated drum patterns:

The team notes that the reconstructions aren’t perfect in part due to the model’s “invariance,” but they point out that the accuracy for the validation set was similar to that for the training set.

“We have shown that the mapping codes are largely tempo and time-invariant and that musically plausible kick drum tracks can be generated given a snare and bass track either by sampling a mapping code or through style transfer, by inferring the mapping code from another song,” wrote the coauthors, who leave to future work applying the same approach to snare drum and bass track generation.


VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more
Become a member