Airbnb details its journey to AI-powered search

Online booking platform Airbnb has more than 5 million property listings and tens of thousands of tours, hikes, and other travel experiences on offer. That's a lot for anyone to sift through, but the San Francisco startup believes that artificial intelligence (AI) can lend a hand.

In a paper published on the preprint server Arxiv.org ("Applying Deep Learning To Airbnb Search"), researchers at the company describe how over the course of two years, they implemented a sophisticated neural network -- layers of mathematical functions that loosely mimic the function of neurons in the human brain -- in Airbnb's web and mobile app to improve the relevance of search results.

The report follows on the heels of Airbnb's in-house AI system that turns design sketches into product source code, and its machine learning-powered language system that translates listing reviews into guests' native languages.

"The application to search ranking is one of the biggest machine learning success stories at Airbnb. Much of the initial gains were driven by a gradient boosted decision tree model," they wrote. "The gains, however, plateaued over time. This paper discusses the work done in applying neural networks in an attempt to break out of that plateau."

As the researchers explained, most guests start with a search at Airbnb's website for homes available in a particular geographic region. Those searches return ordered lists of listings sampled from Airbnb's millions.

Initially, a "manually crafted" scoring function determined which homes and rooms made their way to the top. Eventually, a gradient boosted decision tree (GBDT) -- a model that identifies and ranks predictive factors -- supplanted the scoring function, a switch the researchers said led to "one of the largest step improvements in home bookings in Airbnb's history."

But as the gains in online bookings leveled off, the team turned their attention to AI.

Airbnb doesn't rely on just one AI system. It taps an "ecosystem" of algorithms that predict the likelihood a host will accept a guest's request for booking, and that a guest will rate a trip or experience highly. They're trained with user interactions -- searches are logged, and every model has access to them. And new models, once trained, are tested to see whether they achieve a statistically significant increase in bookings.

Airbnb's first AI search system laid the groundwork for more complex ones to come. The second adopted LambdaRank, an algorithm that applies supervised machine learning to solve ranking problems, while the final model -- a deep neural network (DNN) -- took into account roughly 195 features, including the price, amenities, and historical booking count; the price of listings that have Airbnb's Smart Pricing feature enabled; and the similarity of a listing to those a guest recently viewed.

It wasn't all smooth sailing, of course.

Model training was a trial-and-error affair. The first iteration of the team's processing pipeline, which fed data in comma-separated values (CSV) format to TensorFlow models, used just a fraction of graphics card processing power -- around 25 percent. (Optimizations resulted in a 17 times speedup and drove utilization to around 90 percent.)

One of the neural networks the Airbnb team tested used the unique ID corresponding with listings as a feature. The idea was to index the IDs into an embedding (features mapped onto vectors of real numbers) that would encode each listing's unique properties, much like the recommender systems employed by Netflix and Amazon. But as the researchers explain, that turned out to be infeasible; embeddings need substantial amounts of data per item, and the listings are "subjected to constraints" from the physical world.

"Even the most popular listing can be booked at most 365 times in an year," they wrote, "[and] typical bookings per listing are much fewer."

Making matters more challenging, not all trends were obvious -- at least, not at first. Long views of listings seemed to correlate with bookings in testing, but when a model that simultaneously predicted the probability of booking and long view times was deployed online, it didn't result in an uptick. The team speculates that long views could be driven by a variety of factors such as high-end but high-priced listings, listings with long descriptions that are difficult to parse, or extremely unique and "sometimes humorous" listings, among other reasons.

On the feature engineering front, the team's investigations yielded a previously unconsidered factor that influenced occupancy: listings had varying minimum stay requirements, sometimes extending to months. And they led to the discovery of geographic preferences, like the fact that location further south into the west bay of San Francisco were more popular than locations across bridges, which tend to be traffic snarls.

So was it worth it in the end, despite all the roadblocks and setbacks? The team seems to think so.

"Feeding on the ubiquitous deep learning success stories, we started at the peak of optimism, thinking deep learning would be a drop in replacement for the GBDT model and give us stupendous gains out of the box," the researchers wrote. "A lot of initial discussions centered around keeping everything else invariant and replacing the current model with a neural network to see what gains we could get ... Over time we realized that moving to deep learning is not a drop-in model replacement at all; rather it's about scaling the system. As a result, it required rethinking the entire system surrounding the model."

More