You’d think that adapting Alexa, which supports a range of languages, including Spanish, German, and Japanese, to varieties within a dialect continua would be a piece of cake compared with “teaching” it new language families, but that isn’t the case. According to the Seattle company’s researchers, the gulf between, say, British English and American English is wide enough that it often necessitates training machine learning models from scratch.
That’s not ideal — Alexa should theoretically be able to bootstrap language training using preexisting knowledge. This is why scientists at Amazon are investigating a technique that susses out the topic of a customer’s request, such as music, weather, or sports, and identifies language that isn’t relevant to a given domain.
They describe their work in a newly published paper (“Locale-agnostic Universal Domain Classification Model in Spoken Language Understanding“) that was presented last week at the North American Chapter of the Association for Computational Linguistics.
“One reason multi-task training for domain classification is challenging is that requests to the same domain could look wildly different in different locales,” wrote Young-Bum Kim, a scientist in Amazon’s Alexa AI division. “Requests to the restaurant domain, for instance, would feature much different restaurant names in Mumbai than they would in London, even though customers are requesting the same services — address information, menu information, reservations, and so on. In [some] cases … where requests are more uniform across locales, the outputs of several different locale-specific models could reinforce each other, improving accuracy.”
The team’s domain classifier performs several tasks simultaneously, chiefly learning a statistical model for a language that captures consistencies across regions and learning different classifications on the outputs of both general and locale-specific models. Importantly, an attention mechanism gives different emphases to the outputs of different locale-specific models depending on the input, such that when input data is locale-dependent, it assigns most of its weight to a single locale-specific model and ignores the outputs of the other locale-specific models.
In order to identify domains that should receive “special treatment” at run time, the researchers combine the outputs of the locale-specific models into a single vector at training time, with heavily weighted outputs contributing more to the vector’s final values than lower-weighted outputs. They then concatenate the vector with the output of the locale-independent model and pass it to another network layer for domain classification.
In experiments with four different variants of English — U.S., U.K., India, and Canada — the researchers’ model showed accuracy improvements of 18%, 43%, 116%, and 57% versus models trained on each variant individually. By contrast, in a second test involving adversarial training — that is, training that forced the locale-independent model to extract characteristics of the data consistent across locales — performance tended to be worse than that of locale-specific models trained separately on the same data sets, with the exception of the U.K. samples. The team leaves to future work determining the reason for that improvement.