Microsoft details AI model designed to improve Bing search

Microsoft today detailed a large neural network model it's been using in production to improve the relevance of Bing searches. The company says the model, called a "sparse" neural network, complements existing large Transformer-based networks like OpenAI's GPT-3.

Transformer-based models have been getting a lot of attention in the machine learning world. These models excel at understanding semantic relationships, and they've been used to enhance Bing search, as Microsoft has previously revealed. But they can also fail to capture more nuanced relationships between search and webpage terms beyond pure semantics.

That's where Microsoft's new Make Every feature Binary (MEB) model comes in. The large-scale, sparse model has 135 billion parameters -- the parts of the machine learning model learned from historical training data -- and space for over 200 billion binary features that reflect the subtle relationships between searches and documents. Microsoft claims MEB can map single facts to features, allowing the model to gain a more nuanced understanding of individual facts.

The company also says that MEB, which was trained on more than 500 billion query and document pairs from three years of Bing searches, is running in production for 100% of Bing searches in all regions and languages. It's the largest universal language model the company is serving to date, occupying 720GB when loaded into memory and sustaining 35 million feature lookups during peak traffic time.

Supercharging Bing searches

Many models overgeneralize when filling in a sentence like "[blank] can fly." For example, the models might only fill the blank with the word "birds." MEB avoids this by assigning each fact to a feature so it can assign weights that distinguish between the ability to fly in, say, a penguin versus a puffin. Instead of simply saying "birds can fly," MEB paired with Transformer models can take this to another level of classification, saying "birds can fly, except ostriches, penguins, and these other birds."

MEB can continue to learn with more data added, according to Microsoft, indicating that model capacity increases with newly added data. It's refreshed daily by continuously training with the latest daily click data, with an auto-expiration strategy that checks each feature's timestamp and filters out features that haven't shown up in the last 500 days.

For example, MEB learned that "Hotmail" is strongly correlated to "Microsoft Outlook" -- even though they're not close to each other in terms of semantic meaning. Similarly, it learned a strong connection between "Fox31" and "KDVR," where KDVR is the call sign of the TV channel in Denver, Colorado that's operating under the brand Fox31.

MEB can also identify negative relationships between words or phrases, revealing what users don't want to see for a query. For example, users searching for "baseball" usually don't click on pages talking about "hockey," even though they are both popular sports. Understanding these negative relationships can help omit irrelevant search results.

Microsoft says deploying MEB into production led to an almost 2% increase in clickthrough rates on the top search results, as well as a reduction in manual search query reformulation of more than 1%. Moreover, MED reduced clicks on pagination by over 1.5%. Users needing to click on the "next page" button means they didn’t find what they were looking for on the first page.

"We've found very large sparse neural networks like MEB can learn nuanced relationships complementary to the capabilities of Transformer-based neural networks," Microsoft wrote in a blog post. "This improved understanding of search language results in significant benefits to the entire search ecosystem."