Could natural language models improve their ability to answer questions on the fly? That’s what a team of Amazon researchers set out to answer in a study scheduled to be presented at the 2020 Association for the Advancement of Artificial Intelligence in New York. They posit a method for adapting models based on Google’s Transformer architecture — which is particularly good at learning long-range dependencies among input data (such as the semantic and syntactic relationships between individual words of a sentence) — to address the problem of answer selection. The team says that in tests on a benchmark data set, their proposed model demonstrated a 10% absolute improvement in mean average precision (which measures the quality of a sorted list of answers according to the correctness of the ranking) over the previous state-of-the-art answer selection model, achieving an error rate reduction of 50%.

The approach — Transfer and Adapt, or TANDA — was first proposed late last year but has since been refined.

As the researchers explain, they used transfer learning — a technique in which an AI model pretrained on a task (here, word sequence prediction) is fine-tuned on another (here, answer selection) — with an intermediate step between the pretraining and source model and its adaptation to new domains. In this intermediate step, the researchers fine-tuned the language model on a large corpus of general question-answer pairs based on the publicly available Natural Questions data set, which was designed for the training of reading comprehension questions. The modified version of this corpus — dubbed ASNQ, for “answer selection NQ” — is instead tailored to the task of training answer selection systems, and it complements a small body of topic-specific questions and answers in the target domain that were used to further tune the model.

As with all deep neural networks, Transformers contain neurons (mathematical functions) arranged in interconnected layers that transmit signals from input data and slowly adjust the synaptic strength (weights) of each connection. That’s how all AI models extract features and learn to make predictions, but Transformer uniquely has attention such that every output element is connected to every input element. In effect, the weightings between them are calculated dynamically.

The researchers say their method can be fine-tuned on target data without a search for hyperparameters, or the characteristics of an AI model such as the number of layers, the number of nodes per layer, and the learning rate of the training algorithm, which is often determined through trial and error. This means that it can be adapted to a target domain with very little training data, and that it’s robust to noise (or errors) in the target domain data. Plus, the most time-consuming part of the procedure — the intermediate step — only needs to be performed once.

According to the team, the model achieved a mean average precision of 92% and 94.3% on WikiQA and TREC-QA, respectively — a significant improvement over the previous records of 83.4% and 87.5%. As for the mean reciprocal rank, which measures the probability that the correct answer is near the top of the list, it was 93.3% and 97.4% for the system, up from 84.8% and 94%, respectively.

“The last few years have seen great advances in the design of language models, which are a critical component of language-based AI systems,” wrote Alexa Search team member Alessandro Moschitti in a blog post. “Language models can be used to compute the probability of any given sequence (even discontinuous sequences) of words, which is useful in natural-language processing.”


How startups are scaling communication: The pandemic is making startups take a close look at ramping up their communication solutions. Learn how