Twilio is making it easier for developers to build applications that react to what people say during phone calls with a new feature announced Wednesday.
The company’s Automated Speech Recognition beta will take a caller’s speech and turn it into text. Twilio’s technology hands the text off to developers so their systems can respond to what people say, rather than requiring customers to navigate menus using phone keypads. (Think: “If you want to talk to the support department, say ‘support.'”)
It’s a move by the company to expand the value of its voice tools for developers by adding a layer of machine intelligence over existing support for sending phone calls and texts using code.
Automated Speech Recognition uses Google’s Cloud Speech API to handle 89 different languages and dialects, including Spanish, French, and Mandarin. Developers are billed on a pay-as-you-go basis, starting at two cents per 15 seconds of voice recognition. People doing a high volume of voice recognition can get volume discounts that lower the price down to eight-tenths of a cent per 15 seconds of recognition.
Twilio also announced that it plans to launch a new Understand API that will use natural language processing to provide programs with information about the intent of text passed to it. The API is supposed to work natively with Twilio’s Voice and SMS tools, plus Amazon’s Alexa virtual assistant.
The Understand tool is currently under development, and is being constructed without the help of third-party intelligent APIs, in contrast to the speech recognition functionality. It’s unclear when Understand will be available for developers to use.
The speech recognition news comes as part of the company’s Signal developer conference, which is being held this week in San Francisco. Twilio also announced a new API for easily and anonymously connecting conversation participants, as well as a series of partner integrations that extend its developer tools to other messaging platforms.
Correction: We initially described the Twilio service as VoIP. However, it includes traditional voice networks so we corrected “VoIP” to “voice” in two instances.