VentureBeat presents: AI Unleashed - An exclusive executive event for enterprise data leaders. Network and learn with industry peers. Learn More

“Close your eyes, listen to my voice.” That’s what a meditation coach says to a beginner of guided meditation. When we need to focus, we turn off our visual input, and let our minds do the work.

Sensory inhibition can liberate our minds in profound ways. Some of humanity’s greatest creators, from John Milton to Ray Charles, have flourished after losing their sight, demonstrating how our dominant senses, in this case, our visual faculties, are not required for even the richest mental experiences. The absence of visual stimuli can open up worlds of cognitive possibility, a truth that I believe underpins a momentous sensory shift underway in how humans interact with technology, each other, and the world.

As we approach a new decade, our collective experience is more visual than ever. We’re glued to our screens throughout our days and nights. On social media, we increasingly traffic photos and videos and often place limits on character and word counts when we write.

The ability to see, while central to our evolution and abundantly helpful in navigating the modern world, has emerged as somewhat of an attentional Achilles heel. Large internet companies regularly exploit our eyes in order to capture and sustain our focus and, in a great many cases, subvert our intentions.

Disharmony with devices and digital media in general is no secret. As consumers, we feel it in our stiff necks, eroding attention spans, and anxiety when away from our devices (and while on them). Indeed, many technology leaders who are building addictive digital products are often the most acute observers of their risks, leading them to take pre-emptive action to shield their loved ones from technology’s potential down-sides.

As a leader whose company works with the biggest tech companies every day, I have a front row seat to this tension – at work and at home. Thanks to my smartphone, my daughter is closer than I could have imagined to her great grandmother hundreds of miles away. At the same time, the device distracts her attention — and mine — when we’re mere feet apart.

Because I work in the voice technology space, my home is laden with voice assistantsAlexa, Google Assistant, Siri, to name a few. And while I may be a power user for professional reasons, I’m not alone in this sweeping trend. Nearly 40 million U.S. homes own a voice-activated assistant, and by 2022, it is estimated that more than half of U.S. households will have one.

Over the past two years I have observed a curiously profound difference in how my family and friends interact with voice technology compared to screen-based media. My daughter still engages with voice assistants quite frequently, but voice does not disrupt my household the way screens do. Whereas mobile and tablet devices are individual by design — therefore more isolating — voice is inherently inclusive and participatory. It’s a social experience that brings my whole family in.

My daughter can ask an assistant to play a song or translate a word in Spanish in the same moment she’s doing a puzzle or playing with her little brother. Likewise, if she asks a question that I can’t answer (and she often does — like what is hotter than lava?! err), my new resource is to ask an assistant with her. Rather than sucking me into the digital quicksand of a device, voice tech acts like a trampoline, bouncing me right back into the moment. She’s not isolated, and neither am I.

These observations made me wonder: What’s accounting for the vast differences in how we use voice-first tech compared to screen-first tech?

Screen-based tech is addictive because of vision’s position in the hierarchy of our senses

It helps to start with what makes mobile devices so fundamentally distracting. It’s not just the presence of a screen. In fact, many voice devices also have a screen, foremost among them our phones. It’s the role the screen plays in the UX of the device, and in the hierarchy of our senses as users. “Screen-first” experiences, those in which the screen is the primary modality for input and output, are distracting because visual input is complex; humans are programmed to take in as much of it as possible.

This links back to an innate need to connect. Human survival relies on our ability to understand others. We’re constantly trying to read others’ emotions, behaviors, and actions to understand motivation and intent. And much of this social sensory input is visual (why 65% of communication is non-verbal). Even from birth, babies develop the ability to recognize and detect faces long before most other information.

Screen-first devices capitalize on this reflex and have a way of methodically sucking us back in. Push notifications are designed to target our brain’s executive function, specifically the “bottom-up” brain signals that take priority over the things to which we consciously choose to pay attention. This reaction is reflexive and difficult to override. The result is the semblance of productivity and rise of the multitasking myth.

Distraction stunts innovation, the very thing technology seeks to push forward

The truth is, most adults can’t handle distraction. About 98% of the population is unable to process more than one string of information at a time. The effects of jumping between tasks can eat up as much as 40% of our otherwise productive brain time. Productivity and mental well-being are not the only casualties of distraction. It goes further.

This cerebral breakdown has profound implications on us as makers, that is, our intricate process of creation. Great thinking comes from the ability to immerse ourselves in information and stimulus and then consciously step back from it, giving the brain time to simmer and make those needed connections. It’s why some of our best ideas come to us in the middle of the night. In order for this to happen, however, information must be “saved to disk” in our brains. Task switching impedes how we process and save information, inevitably stunting the brain’s ability to make insightful connections over time. Just think: Moments that were once free from technology to make these connections (walking the dog, waiting in line, getting ready in the morning) are now often occupied by smartphones or other screen devices. Ironically, technology is threatening the very thing that powers it: innovation.

So where does voice fall into all of this? By its very nature, voice technology presents a powerful opportunity to flip the script on our fractured relationship with technology.

Voice is faster in two ways

In English, speaking is 3x faster than typing. It’s faster to ask a digital assistant a question than to type into a search bar or text message. This has been widely recognized as one of the killer applications of voice. Many companies are already leveraging this benefit to bring spoken input efficiency to operations and employee experience.

But voice is also more effective for a reason that has nothing to do with your voice at all. Our sense of hearing is also faster — and this is the other half of the “voice equation.” Auditory reaction time is 4x faster than visual reaction time, meaning we can process information with our ears much faster than we can with our eyes.

Both input and output are faster with voice. Compound that efficiency with the time saved by dodging the rabbit hole of screen-based visual distraction. When using voice, we not only process information faster, we lower the risk of new information competing for our attention and ability to save to disk.

This is not to say that voice and audio-led experiences cannot be engaging. Let’s look at a simple dynamic, say, reading or listening to a story. When visual imagery is not the focus, we create and visualize characters and stories in our minds. Our brains work to translate and interpret words, and the intent behind those words. We create our own cinema of the mind. Podcasts and page-turners can powerfully capture and sustain our attention in ways that unleash our imaginations, without the visual trance induced by moving pixels. Engaging does not have to be synonymous with addictive.

Sensory revolution: Shifting our conscious center of gravity from sight to sound

Voice has the potential to transform and enhance our relationship with all technology by pushing it to the background.

Rather than being a shiny new object or device, it’s an infrastructure – a “thin layer” that will give us greater control over how we engage with technology and use our precious cognitive resources. Imagine the productivity gains of your voice-computing-enabled commute. The rekindling of sustained, face-to-face intimacy around friends and family. The insights! What we can discover about ourselves, our work, and the world. All of this surfaces when we shift our conscious center of gravity from sight to sound and give ourselves time to think freely and unimpeded.

If we’re doing our job right, voice technology will be the catalyst for making the world smarter, without making us dumber, and making technology invisible while driving human connection forward.

We’ve got a ways to go. But that’s the kind of vision we can all get behind.

Nithya Thadani is CEO of RAIN, a firm specializing in voice strategy, design, and development.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.