At the AI-focused Transform event held by VentureBeat in Mill Valley, California, Google VP Scott Huffman, who is in charge of engineering teams for Google Assistant, shared some insights into what it takes to create lasting experiences with voice assistants. Becoming part of a person’s daily routine helps drive adoption, for example, and Google Assistant commands like “Create a reminder” or “Play music” are 40 times more likely to be action-oriented than a Google search query.
Huffman did a great job of sharing unique insights from a platform perspective, but that’s just one side of the story. On the other side are a host of developers, startups, and service providers making their own third-party experiences that work alongside Google Assistant or Alexa.
Below, three industry veterans offer tried-and-true advice for successful voice computing.
Hype destroys value
Perhaps more than any other portion of the tech industry, bots and artificial intelligence have made great strides in the past few years while simultaneously suffering from overhyped and even false claims. After a while, it can become tough to tell truth from fiction.
That sort of exaggerated marketing has consequences, said Omar Tawakol, CEO of Voicera, which makes meeting assistant Eva.
“I hate hype because hype destroys value. Because in the long run, when you have a good innovation, it usually will exceed what you previously thought, but in the short run, it gets so overhyped that you get irrational behaviors on both sides — on the upside with the investors and on the downside with people not being patient enough to take their innovation to the end — so you’ve got to figure out some hype filter,” he said.
Hype around AI assistants is nothing new, agreed PullString CEO Oren Jacob. PullString is an agency that helps clients create voice experiences for Alexa and Google Assistant.
“The overpromising of the industry has been threatening since the first TV broadcast of Siri,” Jacob said.
Not surprisingly, Tawakol said his company‘s ability to deliver on promises is a good predictor of whether a user will choose to upgrade to the paid version of its transcribing service.
“Accuracy and relevance are just core, particularly if you’re going to forward this email to somebody, it better be good, and I don’t think we’re good enough. Every week, two weeks, quarter, we’re better and better, and we’re still not at the point where I feel like the industry has delivered on its promise,” he said.
Shortcomings as opportunities
Alpine.ai. makes voice apps for brands and corporations. CEO Adam Marchick said before he created the company his voice analytics platform was used by more than 3,700 voice app developers to track performance and find out what works.
Today his agency is helping Petco build the PetCoach Google Assistant action and Alexa skill that can tell users what things are safe for pets to eat or ingest.
“Now there’s no chance we’re going to have a 100 percent match rate, and we’ve seen that from all the analytics,” he said.
The action can’t answer every question, Marchick said, but voice app developers don’t have to let every difficult query lead to a frustrated user.
“Instead of saying ‘Can you repeat that?’ or getting it wrong, PetCoach says ‘That’s a great question. We don’t have an answer for you right now, but our vets would love to answer. If you leave your phone number, we’ll text you when we have the answer,’ because it’s such a high-value search query for that customer and it will drive affinity and in-store visits,” he said.
This approach may not match the monetary strategy of the vast majority of voice apps, since it requires speedy upkeep and support, he said, but it’s an example of a way to circumvent shortcomings, ensure customers feel taken care of, and grow the knowledge base that feeds the app’s intelligence.
Voice app discovery is tough
AI assistants are coming to a growing number of visual surfaces, like televisions and smart displays, but it’s still hard to encourage adoption of third-party voice apps for an interface with no home page.
Just because voice commands for direct access to a voice app are available doesn’t mean people use them beyond the most popular use cases, such as setting a timer or listening to music or podcasts, Marchick said.
“The hook of forming habits for a brand or retailer, or even a game … to form a habit without a UI and an [expected] command isn’t really happening,” he said. People would be more likely to find the PetCoach skill by asking “Can my cat eat eggs?” than by saying “Ok Google, talk to Petco.”
Accordingly, Google and Amazon have begun to recommend voice apps in response to natural language questions.
Jacob said that, unfortunately, most options available to consumers today aren’t great.
“I would comment that a whole lot of skills and actions basically suck, and the usage of them is quite poor across both returning users and amount of time spent in them, broadly speaking,” he said.
One place voice apps could help grow offerings from AI assistants is by picking a specific subject and diving deeply into it rather than trying to be like Alexa and offer generic factual answers to a broad range of questions.
An example of this is the Westworld Alexa skill by PullString.
Because it goes deep, some users have spent more than an hour playing the game, he said.
“It’s a particularly directed fiction-based experience that therefore can be bounded, and if you can bound an experience and constrain it left and right, you can go deep because you’re not bouncing off here or bouncing off there to answer anything about the universe in general. And that is the tension of the promise in assistants like Alexa or Siri or Cortana or others,” he said.
One size does not fit all
Voicera has the rare distinction of being an AI company with investors from some of the biggest competitors in AI, including GV (formerly Google Ventures) and Microsoft Ventures.
But unlike PullString or Alpine.ai, it’s attempting to create services for enterprise customers, rather than consumers.
When the company first started to offer its Eva assistant that transcribes meetings and highlights action items, participants in a meeting had to proactively speak up and say things like “OK Eva, this is an action item, remind me to send a copy of the presentation, thanks Eva.” Within a month of launch, users let Voicera know that was a bad idea.
One CEO, Tawakol said, pulled him aside and said to get rid of the command structure because he didn’t want to interrupt a meeting with 10 to 15 people to note an action item.
“It kind of taught us that you can’t copy the usage models for consumer use versus the enterprise,” he said.
Voice is hard
Unlike the precision you can expect from a keyboard or from tapping buttons on a screen, a voice interface comes with a series of challenges and nuances not present in other user interfaces.
“In language, you have the brilliant ambiguity of human language coming at you all the time,” Jacob said. “Topic change is fast and immediate; there’s no continuity in voice in a conversation. Spoken language is made up of fragments, uhs, ahs, and stutters.”
Ensuring assistants understand people with accents will likely require years of work for major voice computing platforms.
And, generally speaking, the voice interface comes with challenges that aren’t as easy to pin down as those on interfaces with keyboards or touch screens.
Consider Amazon and Google’s business choices
Be smart about the vertical you choose, Marchick said. You might be able to rely on Amazon to not cut into your business, for example, but the same can’t be said about other platforms.
“I think games is [a] very safe [vertical]. If you build a compelling game, all they’re going to do is promote you, because it sells more devices. It leads to more daily active users, more engagement. If you go to travel or local business, you’re going to really have to navigate the waters on how much traffic you’ll get … as Yelp found out on the web — but it’s times 100 [more difficult] with voice.”
Jacob agreed with this assessment, adding that building an ecommerce experience on the Alexa platform should give you some pause. He suggested having a conversation with Amazon before pursuing such an agenda.
Who is responsible when things go wrong?
Developers and brands making voice apps along with AI assistants with third-party platforms have to figure out where the responsibility lies when an experience fails.
“I think you have seen and will see different tech giants making different decisions about where that line is, how much dialogue they’re responsible for on the voice OS side versus how much a third-party developer is. It’s not as clear as on mobile, and not as clear as web,” Jacob said. “The fact that you’re asking Google [a question] but Petco is answering — it is on both sides of that line.”
Operating systems for voice apps are “quite a dynamic moving place in the market right now, and developers and platforms alike will have to work together to integrate the best contributions third-party developers can bring to assistants like Alexa and Siri.“
“We’re pushing back and forth on each other now as this field sorts stuff out. That is a very important thing to watch as we go forward to decide how best we can all contribute to this space,” Jacob said.
Updated Aug. 30 at 11:50 a.m. Correction: The initial version of this article stated the Westworld voice app made by Pullstring is a Google Assistant action however its an Alexa skill. We regret any confusion this may have caused.
The audio problem: Learn how new cloud-based API solutions are solving imperfect, frustrating audio in video conferences. Access here