Ever heard of query rewriting? It’s a technique used to mitigate errors in spoken language understanding (SLU) pipelines like those underpinning Amazon’s Alexa, Google Assistant, Apple’s Siri, and other voice assistants. Many SLU systems are split into two components — an automatic speech recognition (ASR) system responsible for converting audio to text and a natural language understanding component (NLU) that extracts meaning from the resulting snippets — and problematically, each of these can introduce errors (e.g., text misrecognition due to background noise and speaker accents) that accumulate and introduce conversation friction.

Query rewriting has shown promising results in production systems, fortunately; it entails taking a transcript and rewriting it before sending it to the downstream NLU system. That’s likely why researchers from Drexel University and Amazon investigated in a preprint paper an approach that uses an AI to replace original queries with reformulated queries.

The team’s system selects the most relevant candidates as the query’s rewrite, using a model that’s trained to capture latent syntactic and semantic information from a query. Given an input query, an embedder module extracts a representation by feeding the query into a pretrained contextual word model. The representation is then merged into a query-level mathematical representation (an embedding), at which point a mechanism is used to measure the similarity of two queries. Millions of indexed original queries and rewrites come from a set of pre-defined, high-precision rewrite pairs selected from Alexa’s historical data, and the most relevant are retrieved by the system on demand.

“The NLU component in a SLU system provides a semi-structured semantic representation for queries, where queries of various text forms but the same semantics can be grouped together through the same NLU hypothesis,” the researchers noted. “For example, ‘could you please play imagine dragons,’ ‘turn on imagine dragons,’ [and] ‘play songs from imagine dragons’ carry the same semantics and have the same NLU hypothesis, but their texts are different. Intuitively, augmenting the query texts with the less noisy NLU hypotheses could be helpful.”

To train the system, the team constructed two data sets: one to pre-train the utterances embeddings and another to fine-tune the pretrained model. The pre-training set comprised 11 million sessions with about 30 million utterances, while the fine-tuning set — which was generated using an existing rephrase detection model pipeline — had 2.2 million utterances pairs.

The researchers evaluated query rewriting performance by comparing the retrieved rewrite candidates’ NLU hypothesis with the actual NLU hypothesis in an annotated test set of 16,000 pairs. For each given query, they retrieved the top 20 rewrites, and they used the rewrites’ NLU hypothesis to measure the system performance by standard information retrieval metrics.

The team reports that pre-training not only significantly reduces the requirement of high-quality query retrieval training pairs, but also “remarkably” improves performance. “While we focus on pre-training for QR task in this paper, we believe a similar strategy could potentially apply to other tasks in NLU,” they wrote, “[for example,] domain classification.”