Research suggests ways voice assistants could accomodate non-native English speakers

In a paper published on the preprint server Arxiv.org, researchers at the University of College Dublin investigated how non-native English users experience voice assistants -- specifically Google Assistant -- compared with native users. By identifying the semantic and stylistic differences between commands the two groups of speakers used throughout the course of experiments, the coauthors say their work demonstrates the importance of expanding the types of users recruited into research to ensure assistants are designed with inclusivity in mind.

A growing body of research shows that voice assistants including Siri, Alexa, and Google Assistant understand non-native accents and verbiage poorly. A Washington Post-commissioned study published in July 2018 found that people who speak Spanish as a first language are understood 6% less often than native English speakers who grew up around California or Washington. More recently, in a test conducted by speech recognition testing lab Vocalize, both Siri and Alexa failed to understand speakers with Chinese accents 22% of the time.

The Dublin researchers recruited 32 participants from a European university via email, 16 of whom were English-first speakers and the remaining 16 of whom were native Mandarin speakers. Both cohorts were asked to complete 12 tasks including playing music, setting an alarm, converting values, asking for the time in a particular location, controlling device volume, and requesting weather information with Google Assistant using a smartphone and a smart speaker. After interacting with both devices, the subjects took part in an interview focusing on topics like general views toward voice assistants, experiences with voice assistants in the experiment, and reflections on how they spoke to each system.

The researchers say that irrespective of device type, "clear differences" emerged between the native and non-native speakers' experiences when using Google Assistant:

Native speakers prioritized vocal clarity, brevity, and planning when approaching interactions with the assistant, while non-native speakers altered their vocabulary based on whether or not they knew a particular word.
Non-native speakers were sensitive to their pronunciation or need to retrieve the correct words during interactions. They regularly felt like they struggled to wake Google Assistant.
Non-native speakers also suggested they sometimes needed extra time to formulate a sentence and that this wasn't taken into consideration by the assistant, which would reset or barge in before they finished their request. In contrast, native speakers perceived the delay between speaking to the assistant and responding as too long, which led them to question whether the assistant was working correctly.
Non-native speakers said screen-based feedback was important in supporting their experiences. For instance, speech recognition transcriptions played on the smartphone's screen were found to help develop speakers' confidence in the assistant's recognition capabilities while also pinpointing reasons the assistant didn't understand something.

The findings aren't exactly earth-shattering -- linguistic differences in pronunciation have stumped algorithms for years. (A recent study found that YouTube's automatic captioning did worse with Scottish speakers than American Southerners.) But they put into relief the technical challenges companies like Google, which has sold tens of millions of smart speakers, have yet to overcome.

In light of the participants' responses, the researchers conclude voice assistants might provide better experiences for non-native speakers if the systems were aware of previous attempts to say commands. Assistants could also provide contextual clues in cases where the intent of commands was recognized but not nouns, the researchers say, or use priming keywords and structures to help users rephrase commands.

"Our [results] highlight some important differences between how ... speakers interact with [assistants]. [Native] speakers emphasized the importance of succinct and short utterances [while non-native] speakers seemed to more heavily place the burden of potential interaction failure on themselves, seeing their pronunciation and lack of linguistic knowledge as significant barriers," the study's coauthors wrote. "Future design of [voice assistants] should look to tailor the experience if the system identifies a user as a non-native speaker ... and should more deeply explore ways to tailor the ... experience. Without these changes, [non-native] speakers may be at risk of abandoning [assistants] use more readily."

In a statement, a Google spokesperson told VentureBeat the company is "committed to making progress" in this area, and that it's developed open source tools and data sets to help identify and carve out bias from speech recognition models. "Fairness is one of our core AI principles ... We've been working on the challenge of accurately recognizing variations of speech for several years, and will continue to do so," the spokesperson said.