Over the past year, millions of us have welcomed Alexa into our homes. Ever since the launch of the Amazon Echo, consumers have flocked to voice-enabled devices — and it’s no surprise that dozens of companies dipped their toes into audio-based experiences at CES 2017.

Looking at this year’s CES lineup, it’s clear we’ve reached an inflection point with audio activation. GE unveiled an Alexa-powered lamp that wakes you up in the morning, Dish has programmed Alexa to tune into your favorite TV shows, and Samsung created a vacuum that tidies your home on command. Organizations across industries are starting to see the value in digital experiences that more closely mimic natural forms of communication.

It’s no longer just about expanding the market horizontally with competing audio-based devices like Lenovo’s Smart Assistant, or even the new voice-activated robot Kuri that brings the audiobot to life, but companies of all shapes and sizes want to tap in — and we’ve only scratched the surface of what’s possible for audio-based digital experiences.

Borrowing from Apple’s playbook

Alexa is an extremely intelligent platform. But for the platform to really take off in a big way, it needs more innovation. Last year, Amazon opened its platform to third-party developers to contribute commands and app experiences to Alexa-enabled devices. It’s similar to what we saw with the Apple iPhone almost 10 years ago. The iPhone was a popular device on its own, but it really took off after Apple opened the App Store to third-party developers, which formed an ecosystem of interesting and unique experiences that users could only access via Apple devices. I believe Alexa will create a similar innovation ecosystem. After all, there is only so much excitement you can get from one-dimensional commands like “Alexa, please turn on the lights.”

With a handful of companies across industries announcing their own Alexa integrations last week, major brands are following the lead of early adopters like Dominos, which created an Alexa integration back in February 2016 to help customers conveniently order pizza. More brands will quickly see the opportunity to reach millions of customers who have already purchased these products — but it won’t be smooth sailing just yet.

Roadblocks ahead

Natural language processing capabilities, location-based services, and big data analysis have all improved dramatically over the past few years. But developers still face major challenges in creating voice-enabled apps for platforms like Alexa. It’s about more than adding audio capabilities to an existing app — developers must combine audio and visual capabilities to create a new type of rich, engaging experience altogether. Designing, building, testing, monitoring, and indexing services for the next experience revolution will no doubt be a challenge.

Context and orchestration will be key. It’s one thing to ask a specific banking app for an audio update about your account balance, but when you ask a device, which integrates with all of your banking apps, that same question there will be multiple correct responses. We need common orchestration standards to ensure that apps within Alexa can break data silos and communicate with each other to better understand context.

It’s easier said than done. Orchestration is a pain for the software development cycle, which we’ve already seen with the mobile revolution. It’s been hard enough to determine if apps work well in tandem with each other — many are prone to crashes when network resources are stretched between multiple apps working at once. Audio will take this to a new level of complexity because we’ll need to root out conflicts of resources or challenging network conditions, while also determining the context of the command and which app should answer first.

Developers must start thinking outside the silo of their specific application and make sure that the context is correct within a variety of user conditions. Testing exclusive app scenarios is no longer enough. Some of this responsibility also falls on the device makers themselves. Platforms like Alexa must learn to handle proper dynamic contexts, which change as more apps are installed on the device. Device makers should look to algorithms that sort and uncover frequently used services and apps, or create new functions to combine answers and data from several sources. A combined effort from device makers and app developers will bring about more efficient orchestration.

As more and more companies take Alexa for a spin, it’s clear that the market is ready and excited to embrace voice-enabled devices. For developers and device makers, the time is now to tackle some of the fundamental technical roadblocks that exist when trying to combine video and audio content together in one “intelligent” experience.