AI engineers at Amazon have developed a novel way to learn users’ musical tastes and affinities — by using song playback duration as an “implicit recommendation system.” Bo Xiao, a machine learning scientist and lead author on the research, today described the method in a blog post ahead of a presentation at the Interspeech 2018 conference in Hyderabad, India.
Distinguishing between two similarly titled songs — for instance, Lionel’s Richie’s “Hello” and Adele’s “Hello” — can be a real challenge for voice assistants like Alexa. One way to resolve this is by having the assistant always choose the song that the user is expected to enjoy more, but as Xiao notes, that’s easier said than done. Users don’t often rate songs played back through Alexa and other voice assistants, and playback records don’t necessarily provide insight into musical taste.
“To be as useful as possible to customers, Alexa should be able to make educated guesses about the meanings of ambiguous utterances,” Xiao wrote. “We use machine learning to analyze playback duration data to infer song preference, and we use collaborative-filtering techniques to estimate how a particular customer might rate a song that he or she has never requested.”
The researchers found a solution in song duration. In a paper (“Play Duration based User-Entity Affinity Modeling in Spoken Dialog System”), Xiao and colleagues reasoned that people will cancel the playback of songs they dislike and let songs they enjoy continue to play, providing a dataset on which to train a machine learning-powered recommendation engine.
They divided songs into two categories: (1) songs that users played for less than 30 seconds and (2) songs that they played for longer than 30 seconds. Each was represented as a digit in a matrix grid — the first category was assigned a score of negative one, and the second a score of positive one.
To account for playback interruptions unrelated to musical preference, such as an interruption that caused a user to stop a song just as it was beginning, they added a weighting function. Songs received a greater weight if they were played back for 25 seconds instead of one second, for example, or for three minutes instead of two minutes.
When evaluated against users’ inferred affinity scores, the correlation was strong enough to demonstrate the model’s effectiveness, Xiao said. Furthermore, it implied that it’s good for more than music — in the future, the researchers plan to apply it to other content, such as audiobooks and videos.