How Graph Search became Facebook's super sexy love letter to structured data

Once upon a time, we identified people, places, and things on Facebook using just strings of characters -- letters in sequences that were never really sorted or categorized in any way.

Then along came Graph Search, Facebook's fascinating way of organizing every string (of characters) on the network into a node (object) or an edge (characteristic/attribute). I am a node; my friendship with fellow writer Drew Olanoff is an edge. Drew's "like" of Starbucks is an edge; Starbucks is a node. You get the idea.

So, how did Facebook turn those strings into nodes? Especially the non-person, non-brand-page nodes: How did the letters "c-h-o-k-e" that you typed into your profile in 2009 become Choke, the fictional 2001 novel by Chuck Palahniuk (another node), which has likes from 43 of your friends (also nodes!) and a complete synopsis on its own Facebook page? When did the letters become entities and, as part of Graph Search, fully searchable?

That's the topic of a Facebook Engineering blog post today, a long read that's well worth the effort for data nerds and programmers -- and any Facebook user who's just interested in how the whole thing works under the hood.

In a mini-treatise on what the company calls "entities," the team writes that when you type "people who work at" into the Graph Search bar, you can see entities at work in the suggested list of employers that appears.

From the post:

People don’t just have connections to other people. They may use Facebook to check in to restaurants and other points of interest; they might show their favorite books and movies on their timeline; and they may also list their high school, college, and workplace. These 100+ billion connections form the entity graph ... There are even connections between entities: a book has an author, a song has an artist, and movies have actors. All of these are represented by different kinds of edges in the graph.

Entities get a dedicated team of engineers at Facebook. To start building the entity graph, Facebookers used the form fields that have been around on Facebook since 2006 or so -- you know, the little box where you'd type in the movies or music that you liked or what your religion is. Those strings, based on the boxes they showed up in, the power of statistics/probability, and related Wikipedia data, got turned into entities -- actual structured data, something the company has been publicly focused on since the launch of Timeline in 2011 and the rollout of Facebook Actions in early 2012.

And that kind of string-to-object conversion was awesome, until they got to the profile of the idiot who wrote in the "Movies" field: "I only like movies with boobs lol." Or the person who wrote in the "Books" section, "The Twilight books" instead of "The Twilight Saga" or "Breaking Dawn." Or movies like "Miracle on 34th Street," which has been through multiple remakes using the same title.

For unmatchable strings, Facebook created millions of Pages as "fallbacks" to store the possibly junk data. The site also started recommending entities to people who had typed in closely related strings.

As Graph Search continues to expand to include more people, places, and things -- especially Open Graph-type things -- behind-the-scenes work on entities will continue to get better, faster, and more accurate. If you see a Facebook entities engineer, please give him or her a hug.

Image credit: Jolie O'Dell/VentureBeat; Facebook

More