If you’ve ever become frustrated by a virtual assistant’s inability to answer questions satisfactorily, not to worry — researchers at Amazon are on the case. In a newly published paper presented in Paris last week at the ACM SIGIR Conference on Research and Development in Information Retrieval, a team from the Seattle company’s Alexa AI Natural Understanding group presented a question-answering technique (“Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs”) that demonstrates “clear improvement” over competing methods.
As lead author Abdalghani Abujabal explains in a blog post, most computerized question-answering systems take one of two approaches: They either perform a text search and try to infer the relationships among entities named in the text or they tap a hand-crafted knowledge graph that encodes these relationships. This becomes tricky with complex questions like “Which Nolan films won an Oscar but missed a Golden Globe?” Text search would require a document containing all information required to answer the question, while a knowledge graph would have to represent every relationship implied by the question explicitly.
The researchers sought to combine the best of both worlds with a system that performs a standard text search — an ordinary web search — on the basis of the input question, using the full text of the question as a search string. It retrieves the 10 or so documents the search algorithm ranks highest before applying algorithms to identify named entities and parts of speech within each document, specifically subject-predicate-object triples like “Nolan, directed, Inception” and “The Social Network, winner of, Best Screenplay.” Lastly, it constructs an “ad hoc” knowledge graph of the identified entities and parts of speech on the fly.
Post-construction, the system leverages syntactic clues and data from existing graphs — like lexicons and embeddings — to suss out which names in the graph refer to the same entities. Name alignments are assigned confidence scores, and a search algorithm looks for cornerstones in the graph, or words that very closely match individual words in the search string.
The system seeks out answers to questions that lie on paths connecting cornerstones, and it evaluates them according to two criteria: their length and confidence scores from the data triples and the name alignments. It eliminates all but the shortest and highest-confidence paths, and it removes all the cornerstones from the graph. along with all the nodes that aren’t named entities.
Finally, the algorithm ranks the remaining entities according to several criteria (such as the weights of the paths that connect them to cornerstones and their distance from cornerstones), and the remaining entity is returned as the answer to the search question.
The team reports that in 36 tests using two different data sets and three different performance metrics, their system outperformed three baselines on 34 and finished a close second on the other two, with an average of 25% (and a high of 80%) improvement over the best-performing baseline. They leave to future work integrating the ad hoc knowledge graphs with existing, curated knowledge graphs and adapting the search algorithm accordingly.