Pinterest has accumulated a great big heap of data in the course of running a website where users can pin, like, and simply click on a smorgasbord of content. And that data is going to good use. It’s enabled Pinterest to build a very smart search box.
In a blog post scheduled to go live today, Pinterest details the complex “data collection” underlying its search engine, called QueryJoin.
The system seems to be encouraging engagement with Pinterest. Since Pinterest launched its Guided Search in April, the number of searches each user conducts has gone up by 25 percent, according to the blog post.
Facebook, Twitter, Google, and other web companies take user engagement data into consideration as well. But Pinterest is younger, and it needs to keep usage going up quickly. It’s growth time. So an engineering feat of this nature, which can impress users and get them to come back again and again, are critical.
QueryJoin draws on data collections that Pinterest has previously talked about, like PinJoin and UserJoin, which take boards and re-pinning activity into consideration. The system also draws on demographic information, as well as searching usage itself.
For example, QueryJoin looks at all the search queries a user makes during a single visit to Pinterest — “to learn how users refined their search queries to find things they were looking for,” Pinterest software engineer Dong Wang explains in today’s blog post.
Things become more involved from there, Wang writes:
We extract search activities from the session logs on a daily basis. For each search activity, we extract the information needed to build the QueryJoins and store them keyed by the date.
Every week, we create a partial QueryJoin by aggregating daily search activities together. For the Pins in the QueryJoin, we join them with PinJoins by image signatures. For each query, we find a set of PinJoins related to the query and then calculate the most relevant Pins and classify queries into categories. We also join the QueryJoin and the UserJoin (the collection of a user’s information such as their boards and Pins) by identifiers and calculate the gender and country stats.
In addition to powering Guided Search, the QueryJoin data collection also contributes to the knowledge base for search autocomplete and relevance, Wang notes.
For more on the data engineering at work for Pinterest’s search capability, check out the entire blog post.