Very recently, the outbreak of AI-doctored pornographic videos in which the faces of the original actors had been swapped with those of celebrities and politicians caused panic over the implications of artificial intelligence applications blurring the line between what’s real and what’s not. Similar concerns surfaced when Google demonstrated an AI technology that could create human voice patterns that were indistinguishable from humans at last week’s I/O Conference.
For the most part, the concerns are well-placed. Thanks to advances in machine learning and deep learning, AI applications are becoming extremely convincing at reproducing human appearance and behavior. There are already several applications that can convincingly synthesize a person’s face, voice, handwriting, and even conversation style. And virtually anyone with a computer, an internet connection, and evil intentions can put them to destructive use.
Experts are already predicting how the combination of these applications will help bad actors conduct fraud and forgery, or cause chaos by ushering in a new age of fake news that is hard to verify and debunk.
However, while most of us fret over its evil applications, we’re missing out on the positive uses AI-synthesizing technology has to offer. There are plenty of ways that AI’s imitation power can change people’s lives for the better.
For instance, earlier this year, Montreal-based AI startup Lyrebird helped Pat Quinn, the founder of the Ice Bucket Challenge, regain his voice, which he had lost to amyotrophic lateral sclerosis (ALS), a devastating, degenerative neurological disorder that gradually destroys the patient’s ability to walk, eat, talk, and even breathe.
Lyrebird uses deep learning algorithms to clone a person’s voice. When the company’s research team gives the machine enough samples, Lyrebird can find common patterns in a subject’s voice and use them to generate recordings that never existed before.
The team conducted this effort in collaboration with Project Revoice, an initiative that aims to help ALS patients like Quinn to avoid losing their voices. Before deep learning, ALS patients had to contend with generic computerized voices. There were other efforts to recreate patients’ voices, but they required dozens of hours of prerecorded sentences, which the software stitched together in a way that still sounded artificial. In contrast, deep learning can create a digital model of the patient’s voice with a few hours’ worth of recordings and generate voices that sound like natural speech, with the proper nuances and intonations.
In the case of Quinn, who had already lost his voice, Lyrebird and Project Revoice were able to use the hours of interviews and speeches he had posted online to create his voice model. The results still sound a bit unnatural and are noticeably synthetic. But for Quinn, who had been using a generic voice to communicate, the difference was dramatic. “After hearing my voice through this new technology, I was blown away! For patients to know they can have their own voice after ALS takes it away, it will change the way people live with ALS,” he said.
Quinn’s story might help shed light on the positive aspects of an industry that has taken much flak for the creepy and unethical uses of its applications. “It’s important that people realize the bright side of this technology,” Lyrebird cofounder Jose Sotelo said.
Voicery, a San Francisco-based startup also doing voice synthesis, is providing brands with customized digitized voices powered by AI algorithms. The human-sounding AI voice can replace the dry, emotionless voices you hear on customer service calls. Companies can also use the technology in a wide range of voice-enabled devices such as smart speakers, smartphones, and self-driving cars that can interact with their owners through speech.
Google is also using WaveNet, its AI-powered voice synthesizer, to create a more natural experience for people who interact with its Google Assistant and its related products.
Other areas that can benefit from synthesized voices are automated text-to-speech applications and audiobooks. “The problem with text-to-speech with media use is you can’t listen to it for very long because it’s repetitive and boring,” Voicery’s CEO and cofounder Bobby Ullman said in an interview with Fast Company. “With this new technology, it sounds much more realistic and it’s much more enjoyable. It’s creating a new market. It could change the way people consume media.”
To be clear, none of these productive uses cancel the threats that AI-synthesizing technology will pose as it becomes better and better, a reality that is not lost on the creators of AI applications.
An ethics page on Lyrebird’s website previously acknowledged that the technology could “potentially have dangerous consequences such as misleading diplomats, fraud, and more generally any other problem caused by stealing the identity of someone else.” To drive the point, the company’s website features several synthesized recordings created with the voices of Donald Trump and Barack Obama.
“As these tools get better, you have to care about the ethics, and it’s important that people maintain ownership of themselves and their voice,” Voicery’s Ullman said to Fast Company.
There are a number of measures that can help minimize the negative uses of AI applications and prevent scammers from using them for evil purposes. For starters, companies that develop these applications must educate users about the capabilities of AI algorithms in imitating humans. Last year, IBM Watson’s CTO Rob High told me in an interview that that companies must be transparent about whether the agent is interacting with is human or AI. “Not just so the end user has that clarity, but more specifically to reinforce the importance that the user reveal themselves only in a way that they feel comfortable,” he said.
Legal safeguards will play an important role to disincentivize the use of AI-synthesizing technology for evil purposes. Lawmakers in the U.S. are looking into the issue and are exploring different solutions to rein in the malicious use of AI applications.
Technology will also be key. Researchers from Germany’s Technical University of Munich have developed a technique that uses some of the same techniques used in AI-synthesizing applications to detect AI-doctored media.
But at the end of the day, we need to realize that we’re entering an age where an AI algorithm can produce anything we see or hear, no matter how convincing it looks and sounds. The threats are real, but so are the opportunities. We must work to minimize the tradeoffs while making the positive uses available to more and more people.
Ben Dickson is a software engineer and the founder of TechTalks, a blog that explores the ways technology is solving and creating problems.