SynVAE AI translates visual artwork into melodies

Synesthesia, which an estimated 1% to 25% of the world's population experience, is a phenomenon in which stimulation of one sensory pathway (such as hearing) leads to involuntary experiences in a second sensory pathway (such as sight). Those with chromesthesia "hear" sounds as color, for instance, while those with lexical-gustatory synesthesia have associations between words and tastes.

In a recent study that appears to be at least partly inspired by this area, researchers at the University of Amsterdam investigated an AI system -- Synesthetic Variational Autoencoder, or SynVAE -- capable of mapping characteristics of paintings and other visual art to musical phrases (in the form of MIDI files). They say in a qualitative trial, human evaluators were able to match MIDI files to their muse with accuracies of up to 73%.

"Art is experienced as a flow of information between an artist and an observer. Should the latter be visually impaired, however, a barrier appears," wrote the researchers. "One way to overcome this obstacle might be to translate visual art, such as paintings, from an inaccessible sensory modality into an accessible one, such as music."

To this end, the researchers devised an AI architecture for translating data from one sensory modality to another in an unsupervised manner (i.e., without paired ground truth corpora). They compiled a corpus of 180,000 oil and watercolor paintings from the open source Behance Artistic Media and MNIST data sets, which they used to teach SynVAE relationships between visual elements and musical sequences.

In one of several evaluations, human volunteers were tasked with classifying images' tone or mood using one of three descriptors -- "scary," "happy," or "happy and peaceful" -- by listening to the SynVAE's MIDI creations. The results show that they correctly interpreted the artwork without having seen it the majority of the time, suggesting that at least some emotion perceived through color and composition can be conveyed "for complex data."

"[Our work confirms that] ... audio-visual consistency is not only theoretical, but also very perceivable," wrote the researchers. "As shown by our results, it can be concluded with high confidence that SynVAE is able to consistently translate a diverse range of images into the auditory domain of music through unsupervised learning mechanisms. We ... hope that the methodology outlined in this research will provide a solid basis for evaluating unsupervised, cross-modal models, in addition to SynVAE itself enabling more intuitive and inclusive access to visual artworks across sensory boundaries."

"AI art" is a burgeoning field of study that's been undertaken by the likes of Adobe and Google, not to mention researchers at MIT and independent data scientists the world over. Adobe AI in June demonstrated an AI system that learns painting styles to reproduce artwork in under a minute. More recently, Nvidia demoed a family of algorithms -- GauGAN -- that creates lifelike landscape images that never existed. And researchers at the MIT-Watson lab architected a program that allows users to upload any photograph and edit the appearance of depicted buildings, flora, and fixtures to their heart's content