Sensory's VoiceHub promises to add multilingual NLU to any product

Adding speech recognition to a product isn't easy: Assuming the device has the microphones and chips necessary to detect spoken words, building software to properly transform even one language worth of input into correct responses is a huge hurdle for most companies. Today, Sensory is officially releasing a solution called VoiceHub, which promises to add multilingual natural language understanding to any product, enabling enterprises without past NLU expertise to deploy globally viable smart hardware at scale.

Using the web-based VoiceHub portal, developers "with no programming experience" can establish the wake word, simple commands, and large natural language vocabulary needed for a given product, including regionally specific customizations across multiple dialects of English, French, Mandarin, Portuguese, and Spanish. Once a model has been built on the web, VoiceHub makes downloading it to a test device as easy as scanning a QR code; the model then connects with Sensory's TrulyNatural on-device speech recognition software, which has the large vocabulary and parsing skills necessary to respond to spoken requests across various languages.

VoiceHub's release is significant for technical decision makers because it can radically improve both the time to market and performance of products with voice interfaces, enabling any enterprise to start using NLU as an alternative to a computer with touch and/or keyboard input -- or to reduce the need for human customer service agents. It also avoids sharing data with Amazon, a key concern for potentially competitive companies that might otherwise consider Alexa-based solutions.

Sensory notes that its software can run on the sub-$70 ARM chip-based ST32MP1 Discovery Board from ST Microelectronics, as well as Android and iOS devices, giving developers the ability to prototype working products "in a matter of minutes, not days." Also notable: No internet connection is required for the finished solution to work properly, as processing is handled directly on the device, which means that the enterprise developer determines how much or how little voice-related data to share.

In a video made with a larger ST32MP1 Evaluation Board, Sensory demonstrated a custom coffee shop ordering platform that lets customers request coffee, tea, or hot cocoa drinks from a microphone equipped screen. After saying "Hey Barista" as a wake word, the customer names a specific drink and customizations, then says "confirm" to place the order. The inexpensive hardware includes noise-canceling dual microphones that let Sensory's software recognize spoken commands properly at up to 10-foot distances, even with a coffee shop's typical ambient noise.

Since opening a limited beta program in October 2020, Sensory says designers have also successfully tested VoiceHub with automotive, wearable, smart speaker, and smart home products; it expects that the final release will accelerate both branded voice experiences and domain-specific voice assistants across a wider variety of use cases. Moreover, because VoiceHub ties into TrulyNatural's large, multilingual vocabulary, Sensory expects that solutions will be "truly conversational," with the ability to understand "millions of unique phrases" rather than just a handful or two of words.