AI identifies Thai country music from audio and lyrics

Luk thung, a popular subgenre of Thai folk music that emerged shortly after World War II, consists of poetic lyrics often sung with a distinctive vibrato and accompanied by traditional instruments like the khene (mouth organ), phin (lute), and saw sam sai (fiddle). Its aesthetic is distinct in the musical world, and it predictably trips up music classification algorithms trained on Western genres. That's why researchers at Chulalongkorn University in Thailand investigated a system capable of identifying specific types of luk thung songs from lyrics and audio alone.

"Luk thung ... is one of the most prominent genres and has a large listener base from farmers and urban working-class people," wrote the coauthors. "For the purpose of personalized music recommendation in the Thai music industry, identifying Luk thung songs in hundreds of thousands of songs can reduce the chance of mistakenly recommending them to non-Luk thung listeners."

The researchers' system comprised two models -- one that classified lyrics and another that classified audio -- that fed into a final classifier that aggregated intermediate features learned from both individual models. To train them, the team compiled a data set of 10,547 Thai lyrics and audio from the year 1985 to 2019, along with labels denoting the mood, tempo, musical instruments added by "musical experts." They next constructed word-based features using the entire lyrics from the beginning to the end of the song, and for each song, they excerpted a 10-second clip from an audio file in its chorus part.

Because luk thung songs span dialects and regional vocabularies, the researchers opted for a "bag of words" approach to lyrics classification, where a text (such as a sentence or a document) was represented as the bag (multiset) of its words without regard for grammar or word order. As for the audio model, it was designed to learn the timbral and temporal properties of the song spectrograms -- visual representations of signal frequency changes -- it ingested.

So how'd the model perform? Well, according to the researchers, their three-component method "substantially" improved overall accuracy for luk thung classification. Moreover, they say it was well-suited to tasks like classifying streaming songs and automatically generating comprehensive lists of luk thung songs for future recommendation, and for studying the evolution of luk thung music over time.

"Country songs, which includes luk thung -- bear some resemblance to each other in the distributions of words used in lyrics. This problem may be tackled with document-level, instead of word-level, representation ... Vocals might serve as the main remaining determinant that makes Lukthung differentiable from other genres. Thus, isolating singing voice from instrumental and designing vocal-specific filters may beneficially improve the classification outcomes."