Why the next Amazon Echo and Dot will have a screen

2017 is already going down in history as the year that voice computing went mainstream.

Amazon leads the pack, selling well over 8 million Echos and Dots in just two years and leveraging the Amazon Voice Services (AVS) platform to get Alexa into everything from refrigerators to dancing robots to Ford F-150’s (the best-selling vehicle in the U.S. for 40 years).

Combined with other voice computing products, like Google Home, and the potential launch of an Apple Siri speaker this summer, it’s not out of the question that over 25 million more voice devices will ship this year. Despite this growth, voice computing is already showing some core problems in user retention and discovery. According to a new study by Voice Labs, new skills/actions will lose 97 percent of their users in just two weeks, while less than a third of the 10,000 Alexa skills have more than one review. But this isn’t because voice computing is failing. It’s because voice is only a part of the coming ambient computing revolution.

“Ambient computing” refers to making the capabilities of a place, such as a home, directly accessible to anyone present, without the need for an intermediate device, like a mobile phone or computer. If you have ever stood in your kitchen and asked Alexa to play music or turn on the lights, you’ve used ambient computing. (Incidentally, these are the two most common uses of Alexa, each comprising 30 percent or more of Alexa requests). If you’ve ever had lights with motion sensors turn off when you aren’t in the room, or armed your security system using a wall keypad, you have also used ambient computing. Voice computing is just one of many ways that you can interact directly with your environment.

Voice computing works well for direct interactions when you know exactly what you want, such as asking for a weather forecast, but is critically lacking in other interactions, such as choosing from a list of options, reviewing information, or discovering what capabilities are available. General purpose ambient computing devices will have a range of interfaces adapted to relevant use cases and consumer preferences.

This raises the question: Given a range of ambient computing capabilities, which interface will get used the most? We recently did a pilot test of Brilliant Control, a smart home control panel. Analyzing thousands of interactions in households that had voice services turned on, we found that voice was used 14 percent of the time, while touch was used 81 percent of the time, and motion about 5 percent of the time.

Why was touch used the most often? It turns out it was due to three factors. First, simplicity. It’s still easier to flip on lights with your finger when you walk into a room. Second, choice. Selecting between options, such as songs/playlists/channels for music players, is far more natural with a screen. Third, interactive feedback. It’s much faster to adjust dimming levels for lights or sound levels for music with the slide of a finger than by issuing successive commands until you find the right level.

Voice computing still had a role. In fact, household use of voice computing actually increased overall, for the simple reason that Alexa could be reached in more rooms throughout the house. Voice computing has an important role in the home of the future, but it is not a complete solution in itself.

If the next Amazon Echo does include a screen, it will provide an effective interface for a much broader range of ambient computing interactions than voice alone. This will help break through the discovery and retention challenges that voice computing has today and solidify Amazon’s lead. The company's recent announcement of “display cards” shows it is headed in this direction by giving AVS partners the ability to return visual data from a voice command.

The Amazon Look device, which combines Alexa with an AI-driven camera to help you choose your outfits, demonstrates Amazon's ability to execute. If Amazon combines its back-end AI engines for voice, display, touch, and visuals, it could yield some exciting (and perhaps deceptively simple) interactions. This would take ambient computing to a whole new level and quite literally make Alexa a household name.

By 2018, over 50 percent of households will have adopted smart home products. The full realization of ambient computing will make it fun and delightful to interact with our homes, rather than forcing us to rely on the clunky mobile phone-driven or voice-only interactions that exist today. We will see this in products from big players like Amazon but also from the companies they power with their open platforms, such as our company. If Google and Apple also open their platforms to third parties, I think we will be surprised at how quickly homes without ambient computing begin to feel outdated and frustrating. It only took eight years for mobile phones to go from being simple voice/text devices to becoming mandatory always-on companions that simplify day-to-day life for billions of people. Ambient computing is already on its way to being the next big wave.

Aaron Emigh is the CEO and cofounder of Brilliant, a smart home company.

More