Pinterest details the AI and taxonomy systems underpinning Trends

Last December, Pinterest announced the launch of Pinterest Trends, a feature that reveals the past year's most popular search keywords. Much like Google Trends and Bing's Keyword Research Tool, Trends spotlights terms that peaked over the past 12 months, using algorithmic data to sort by volume.

Trends became available globally this week in beta, and in the spirit of transparency, Pinterest detailed how the taxonomic system underpinning Trends canvases the over 200 billion ideas across 4 billion boards created by the social network's over 320 million users. "Because people come to Pinterest to plan, we have unique insight into emerging trends," wrote Song Cui and Dhananjay Shrouty, software engineers on the Content Knowledge team. "We're able to gather these insights because Pinterest is fundamentally a different kind of platform where ... people from around the world come to save ideas and plan."

Taxonomy

Pinterest taps a taxonomic knowledge management system that enables content-level understanding, according to Cui and Shrouty. It classifies each entity and defines the relationships among them, with the goal of improving the accuracy of AI models on the platform involved in search and classification tasks.

The taxonomy -- which supports 17 languages for 20 countries, with more to come -- organizes popular topics throughout the platform and curates interests and nodes (Pins) for ads and ongoing campaigns. Interests are grouped together in a hierarchical parent-child tree structure, where each child is a subclass of its single parent, and the top-level taxonomy nodes define broad verticals -- e.g., "Women's Fashion" and "DIY and Crafts -- that capture the general interests associated with Pins. (Child nodes up to 11 levels capture more granular topics.)

Classifying content

A taxonomy wouldn't be of much use if there wasn't a mechanism for mapping Pins to said taxonomy. That's why the Content Engineering team built Pin2Interest (P2I), a content-classifying system that ingests embeddings, text and visual inputs, and board names to create personalized recommendations and ranking features for other AI models. It's currently being used in production to rank Pins on users' home feeds and for advertisement targeting.

Mapping users and queries

The taxonomy's usefulness extends beyond trending topic tracking. In point of fact, a system dubbed User2Interest (U2I) uses it to map users to their interests. Pins with which people engage and those Pins' corresponding interest labels, which are generated by P2I, serve as signals that inform U2I's predictions in ads targeting, organic recommendations, and user-centric insights on the taxonomy. For instance, it can compute statistics like the number of users per taxonomy node to inform advertisers of shifts in overall interest.

Creating and maintaining the taxonomy

Clearly, the interest taxonomy plays a vital role in matching users with content they're likely to enjoy. But how is it curated? According to Cui and Shrouty, it's a multi-step process involving what's called a resource description framework (RDF), use of the open source ontology dev environment WebProtégé, and an engineering workflow that facilitates updates.

RDF is used to create graphs (which comprise nodes and edges that connect to the nodes) while WebProtégé creates visualizations, both of which aid the team of humans tasked with vetting the taxonomy. As for the aforementioned engineering workflow, it sees Pinterest scientists take the RDF graphs in XML format and produce relational database tables for downstream usage.

For every iteration of the taxonomy, Cui, Shrouty, and team develop and extend the taxonomy developed from the previous iteration. When new versions are created, operations like adding a new node, renaming an existing node, deleting a node, and merging two or more nodes are performed with heuristic rules.

Adding to the taxonomy

Before a new topic is added to the taxonomy, the Content Engineering team first sends out candidate terms to its content, legal, and other divisions for review. Then, using an AI system called Neural Taxonomy Expansion (NTE) -- which is used in production for taxonomy expansion projects within Pinterest -- the likelihoods of the existing node as well as that of the parent candidate terms are predicted. The predicted parents are reviewed manually to ensure the taxonomy is of high quality, after which the nodes are added to the current taxonomy in WebProtégé by taxonomists.

In future work, Cui, Shrouty, and colleagues intend to work toward building new types of relationships among entities automatically in the taxonomy and associate attributes. "Moving forward, we're excited to keep evolving how we capture and understand trends in a more timely and systematic manner," they wrote.

Pinterest employs machine learning across its business -- not strictly for taxonomic purposes. Last October, the company revealed it leveraged AI that identifies and hides content displaying, rationalizing, or encouraging self-injury to achieve an 88% reduction in reports of such content. Lens, Pinterest's AI online/offline visual search tool that identifies things captured from Pins or by a smartphone and suggests related themes and products, can now recognize 2.5 billion home and fashion objects. And as early as 2015, Pinterest began using AI to surface Related Pins, or Pins tangentially relevant to those visually above them on the web and mobile.