Keyword queries make up a diminishing portion of web searches, believe it or not. Thanks to tools like Google Lens and Bing Visual Search, computer vision algorithms drive more than their fair share, as do the natural language processing models underpinning intelligent assistants like Alexa and Google Assistant. The increasing mix of media is one reason why Microsoft turned to another AI technique — Space Partition Tree And Graph (SPTAG) — to better parse searches. It’s available in open source today, along with example techniques and an accompanying video.
As Microsoft explains in a blog post, SPTAG enables developers to leverage results-finding AI that sifts through vectors — mathematical representations of words, image pixels, and other data points — in milliseconds. SPTAG (which is written in C++ and wrapped by Python) is at the core of a number of Bing Search services, Microsoft says, and it’s been used to help researchers at the company “better understand the intent” behind “billions” of web searches.
To see it in action, try tapping out the search query “How tall is the tower in Paris?” in Bing. It’ll yield the right answer — 1,063 feet — even though the word “Eiffel” doesn’t appear in the question and the word “tall” never appears in the result.
So how’s it work? Vectors assigned to bits of data can be arranged — or mapped — in proximity to one another to indicate similarity. These proximal results get displayed to users; in Bing, after you perform a search, the indexed vectors are scanned to deliver the best match. Additionally, the assignments are used to train models that consider inputs like post-search end-user clicks to “get better at understanding the meaning of that search.”
Microsoft says that Bing Search has cataloged over 150 billion pieces of data to date, including single words, characters, web page snippets, and full queries. “Bing processes billions of documents every day, and the idea now is that we can represent these entries as vectors and search through this giant index of 100 billion-plus vectors to find the most related results in five milliseconds,” said Bing program manager Jeffrey Zhu.
The Bing team expects that the open source SPTAG could be used to build apps that can identify a language being spoken based on an audio snippet, or services that lets users take pictures of flowers and identify the genus and species.
“Keyword search algorithms just fail when people ask a question or take a picture and ask the search engine, ‘What is this?’ Even a couple seconds for a search can make an app unusable,” said Bing group program manager Rangan Majumder. “We’ve only started to explore what’s really possible around vector search at this depth.”