We're thrilled to announce the return of GamesBeat Next, hosted in San Francisco this October, where we will explore the theme of "Playing the Edge." Apply to speak here and learn more about sponsorship opportunities here. At the event, we will also announce 25 top game startups as the 2024 Game Changers. Apply or nominate today!

With half a dozen new products from Amazon announced last week and more to be unveiled by Google this week, the proliferation of devices you can speak with is on the rise.

These devices have been targeting your home for a while now — Google’s speaker is literally named Home and Amazon’s new devices were unveiled at Amazon HQ in a mock home setting — but things are about to get a lot more personal.

Wireless headphones are launching with the promise of putting an AI assistant in your ears at all times. Google is reportedly planning to release earbuds with Google Assistant inside in the coming days, two weeks after the release of Bose’s QC35 II with Google Assistant. Amazon is also reportedly working on inner ear tech for Alexa without headphones.

This shift matters. Qi Lu, an executive engineer at Baidu who previously helped make Cortana at Microsoft, told Wired the reason Amazon pulled ahead of competitors making assistants that were years ahead of Alexa is because the Echo is an AI-first device. He means voice, image recognition, or facial recognition are the first interactions, and touch is optional or secondary. Unlike smartphones or computers, the smart speaker encourages voice interaction from the start.


VB Transform 2023 On-Demand

Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.


Register Now

Just like hands-free smart speakers disrupted the digital assistant game, wireless headphones could make interaction with AI assistants that much more appealing and personal.

As voice assistant leaders like Amazon and Google begin to make devices that enter our ears, developers and third parties should be invited to create personalities for AI assistants, and tech giants should give users the right to pick their assistant’s personality.

Doing so could quicken AI assistant adoption and what Siri co-creator Adam Cheyer calls the start of a decade-long shift toward voice as the primary way people fulfill their computing needs.

The Her effect

Natural language understanding isn’t new. In fact, it’s decades old, but we seem to be inching closer to the moment when the promise of Her is within reach. To be clear, we haven’t reached the point where an assistant can pull off conversation or humanlike exchanges as well as Scarlett Johansson’s character Samantha in the film. AI assistant jokes in a stand-up comedy setting make that clear enough, but many fundamental elements appear to be in place.

Chief among them is that AI assistants are inside earbuds and headphones, speech recognition error rates are falling, and many of the biggest consumer tech companies around the world are investing in the technology.

If voice achieves the paradigm shift Cheyer predicts, when we look back at 2014, it won’t just be remembered as the year the Echo made its debut but also as the year Her debuted.

In my work covering assistants like Siri and Cortana and the third-party voice apps like Alexa skills being built for assistants, the innovators and dreamers I speak with mention the fictional assistant in Her more than any other milestone or archetype. In the movie, Samantha achieves such a personal, humanlike bond with Joaquin Phoenix’s character Theodore that he falls in love with his AI assistant. But he didn’t do that by speaking with a smartphone or a speaker like the Echo.

He did it with a pair of wireless headphones that were always in his ears.

Above: Joaquin Phoenix in “Her”

Image Credit: Warner Bros. Pictures

For Theodore, Samantha didn’t start out as a love object, but rather as a cure for loneliness. And Alexa AI senior project manager Ashwin Ram believes Amazon’s AI assistant could help cure loneliness, too. In looking at the data from people’s interactions with Alexa, Ram said, they don’t just yell commands at her, they try to chat with her and joke with her, and they tend to see her “not just as an assistant but almost as a friend or family member.”

In response to a question about the kind of bots he wants to see built, Ram said, “I think that the app that I would want is an app that takes these things from being assistants to being magnanimous, being things we can talk to, and you imagine it’s not just sort of a fun thing to have around the house, but for a lot of people that would be a lifesaver.” He also noted: “The biggest problem that senior citizens have, the biggest health problem, is loneliness, which leads to all kinds of health problems. Imagine having someone in the house to talk to — there’s plenty of other use cases like that you can imagine — so I would want a conversationalist.”

An antidote to loneliness

Ram spoke on a panel last month titled “Say ‘Hello’ to Your New AI Family Member” with Gummi Hafsteinsson, leader of Google Assistant; Cortana lead Jordi Ribas; and Siri co-creator and Viv Labs cofounder Adam Cheyer. The panel was brought together by Menlo Ventures, an investor in pre-Apple Siri.

Backing up this idea of AI companionship, the company Next IT has reported that its digital assistant Jenn on the Alaska Airlines website sometimes sees an influx in activity during evening hours from people who just want someone to chat with.

To Ram’s point, loneliness is a major health risk that’s increasingly recognized as potentially worse for you than obesity or smoking. Recent analysis of almost 150 studies that included more than 300,000 people found that people are 50 percent more likely to die early when they suffer from loneliness or isolation.

Many people’s first reaction to the idea of a digital assistant being treated like a family member or easing isolation might be disbelief, but it turns out not everyone seems to mind speaking with a computer.

People sometimes wonder how Cortana performs in the Turing test, Ribas said. (Alan Turing created the Turing test in 1950 to gauge whether a computer can convince a human that they’re speaking to another human.) This is actually not all that important, Ribas said, because that moment has already passed for some, or it just doesn’t matter to them.

“It’s already gone, because we’ve already got chatbots like Xiaoice in China, that talks to tens of millions of people [a month], and every conversation on average is 30 interactions back and forth,” he said. “I think some of you would probably use it and say ‘Well that wouldn’t pass the Turing test for me,’ but definitely there’s something going on, it’s happening, and tens of millions of people are finding it helpful.”

In movies like Interstellar and Star Wars, characters are fully aware that they are speaking with a robot but still treat it as a companion. In Her, Joaquin Phoenix’s character knew he was speaking with an operating system and yet he not only found companionship, he fell in love.

And chatbots that fulfill social functions are already widely available. Xiaoice is one of a handful of conversational bots made by Microsoft. There’s also Rinna for Japanese users, and of course, there was also Tay.

In 2016, Tay was made in Xiaoice’s image for the U.S. market and seemed poised to become a major part of Microsoft’s conversational computing and Microsoft Bot Framework initiatives. However, Microsoft made the mistake of allowing language the public used with Tay to shape her word choices, and after less than 24 hours on Twitter, Tay had become a racist, sexist catastrophe and had to be shut down. Zo has garnered fewer headlines but emerged to chat with people in the U.S. in December 2016 and now counts hundreds of thousands of users.

Ram reinforced the idea that we can bond with bots without believing they are human.

“This is not about trying to build machines that will trick people into thinking they’re human or building tasks where the goal of the human is to trick the computer into revealing itself,” he said. As part of his work at Amazon, Ram leads the Alexa Prize, a $2.5 million university competition to have a 20-minute conversation with a bot. You can speak to these bots now by saying “Alexa, let’s chat.”

“We deliberately did not choose the Turing test as the criteria because again it’s not about trying to figure out if this thing is human or not. It’s about building a really interesting conversation, and I imagine that as these things become intelligent, we’ll not think of them as human but find them interesting anyway. And so I think the Turing test is a bit of a red herring to go after.”

A personality made just for me

But even if people are willing to overlook an artificially intelligent companion’s … artificiality, the quality of the interaction is important, which is where personality comes in. And just as individual people vary, there can be no one-size-fits-all solution when it comes to designing an AI personality.

If I want a sassy Siri, I should be able to get a sassy Siri … or a Siri that’s snarky or a dirtbag or a bit of a curmudgeon. All that sounds appealing to me, and more interesting than today’s “get stuff done” default for everyone. There’s nothing wrong with getting stuff done, but users should have options.

Now, do I want that personality to give me sass when I’m clearly in a hurry, at work, or low on battery? Probably not. “Personality” is a luxury that can’t always be enjoyed when I’m in “get stuff done” mode, but even then, I appreciate a conversation that feels more natural.

Tech companies are certainly aware of the need to push their assistants further when it comes to personality. As part of the iOS 11 upgrade that started rolling out last month, Siri got a new voice with more expression. This was achieved by mapping the phonemes, or tones in speech, piece by piece. Machine learning was then used to stitch those sounds together. Upgrades have also been made to Alexa’s voice to add humanlike utterances like sighs, pauses, and the occasional “Boom” or “Booyah.”

Phoneme-mapping, the tech used to give Google Assistant or Alexa expressive voices, paired with an understanding of the personality traits appreciated by each individual user could result in a very different AI assistant experience.

Let’s say there’s a certain type of personality you like having around at work. Wouldn’t working with an assistant modeled after those personality traits be better than interacting with a dry or robotic default personality?

Maybe a stern voice is the best approach in the morning as I begin to chase the day and a soothing voice is optimal in the evening to lower my blood pressure and get me ready for bedtime.

It’s not as if tech to identify personality in conversation doesn’t exist. By analyzing millions upon millions of conversations, customer service call centers for corporations can recognize your personality based on word choice in 30 seconds. Alexa and the like could learn to do the same.

It’s not as if these platforms lack behavioral data on us all.

Major tech companies are, however, working to boost their assistant’s broad appeal. The Google Assistant personality team is led by chief doodler Ryan Germick, together with writers formerly of The Onion, Pixar, and other creative outlets. This year, its personality has done things like share New Year’s resolutions or tips for dads on Father’s Day.

The personality team at Amazon is why Alexa is able to sing to you, crack jokes, and tell you who will win the Super Bowl or Academy Award for best actress. When approached by VentureBeat, none of the big companies have shared the size of their personality-focused teams, but they all have in-house people working to animate their AI assistants.

While that’s all impressive and lays an important foundation, these assistants are very much the creations of marketing, walled gardens, and partnerships struck by tech giants — entities with budgets the size of nation-states. The reason Alexa plays games with you, the reason she answers the question “Are you a feminist?” is because she wants you to eventually do some shopping.

Compounding this tension, the tech giants leading the charge in human-computer interaction aren’t very sympathetic right now.

Google, Facebook, and Amazon (Facebook is reportedly working on its own voice assistant) are often associated with the term “antitrust,” and with mountains of data on all of us, they appear to be compounding their power in the AI age.

Sitting across from a computer screen or holding a smartphone while interacting with the services of tech giants is one thing. An assistant with the power to control your life in your ear may be another entirely. The idea that these companies would decide the personality of the assistant that spends so much time in my ear gives me pause, especially when you consider the control they each have over the lives of billions of people already.

To open the playing field and give consumers real choice, third-party developers should be encouraged to make personalities that sit atop the giant stack of tech that powers AI assistants. I want to go to an open market, something akin to the App Store, to buy and download a personality or have one custom-made based on my old football coach or best friend.

Based on industry-leading assistants like Siri and Alexa, we have yet to reach a Her moment as it relates to the ability to carry on a conversation or exhibit deeply human expressiveness. But as wireless headphones with AI assistants inside become commonplace and technological advances continue to compound, it seems clear we’re on the brink of a time when assistants with rich personalities will be nestled in our ears.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.