Picovoice's voice assistant promises cloud-level accuracy on edge devices

Picovoice, a Canadian company, wants to put a voice assistant that promises cloud-level accuracy onto all manner of edge devices, and even within a web browser. There are three components in the process -- a wake word, speech-to-text translation, and speech-to-intent. Picovoice previously rolled out Porcupine for wake word detection and Rhino to handle speech-to-intent, but it's now added Cheetah speech-to-text translation to complete the trio. All are available via GitHub.

The stack operates in real time on-device, without an internet connection, and promises extremely low resource requirements. Picovoice's goal is solving two significant problems at once: privacy and resource requirements.

The voice assistant process is typically resource-heavy and tied to the cloud. But there are significant privacy concerns around cloud connectivity. In these very pages, we've discussed how most smart assistants collect and store your voice data and what you can do to control it. And of course, Apple is under fire for how its contractors listened to Siri recordings.

For the privacy-minded, then, keeping smart assistants away from a troublesome back end owned by a tech giant is a must. The problem is that it's difficult to run a voice assistant without cloud support, but Picovoice claims that its offering can even run on a $5 Raspberry Pi Zero. It can also run directly in a browser and is generally platform agnostic. Cheetah, for instance, runs on iOS, watchOS, Android, Linux, macOS, Windows, Raspberry Pi, Beagle Bone, and "all modern web browsers," according to Picovoice founder and president Alireza Kenarsari-Anhari.

The low resource requirements are important not only because compute resources are generally limited on edge devices, but also because higher demands drain battery life faster. The key way Picovoice reduces resource needs is by keeping things domain specific. That is, you'd use different models with a TV versus a washing machine. That makes intuitive sense. Consider the realms of possibilities the smart assistant on your phone has to handle. When you use a wake word, it doesn't know if you're going to send a voice text, ask for a song, make an internet search query, set a meeting, or what. It has to start with the possibility of everything.

Picovoice's technology, by contrast, starts a level or two deeper. By keeping a given application limited to a specific device like a coffee maker, there are orders of magnitude fewer possible requests, commands, and actions involved. With this method, Picovoice claims that it's achieved accuracy parity with Google and Amazon.

The idea, then, is that a brand or company can use Picovoice to create custom voice experiences for customers. Ostensibly the cost is much lower than a cloud service backend, although hard costs are unclear -- but it appears that it will scale dramatically on a case-by-case basis.

Direct competitors to Picovoice include Mycroft, Snips, and Sensory. But Kenarsari-Anhari asserted that Mycroft doesn't have its own speech-to-text engine and uses third parties, and that Snips and Sensory demand more runtime resources than Picovoice and work on fewer platforms. They do, however, promise some of the same privacy advantages.

Picovoice already has some customers, although the company didn't elaborate much beyond naming a few names -- LG, Whirlpool, and Local Motors, in addition to "dozens" of others that can't yet be discussed.

More