Web content analysis startup Diffbot has launched a new beta API called “Page Classifier” that can reveal the page type and language behind any URL, the company announced today.

Diffbot’s first APIs emphasized the scanning, parsing, and extracting of information from web pages. Developers could use these APIs to scan articles or homepages to pull the most meaningful content. Now it will expand its dev cred with Page Classifier, which could have a variety of uses.

The Page Classifier API is already being used by social bookmark app Springpad with the enhancement of links by showing additional content that relates to those links. Page Classifier can be tested out with a Google Chrome extension that analyzes Twitter updates and shows what sort of links are attached to each status, including photos, articles, and products.

“We’re constantly surprised by how developers are using our APIs and how devs might take advantage of the Classifier,” Diffbot CEO Mike Tung told VentureBeat via email. “Internally, we’re already benefiting — our classification of a day in Twitter showed us images are most of what’s being shared, so we’ve prioritized development of our Image API, which will be the next one we release.”

Palo Alto, Calif.-based Diffbot was founded in 2008 and has raised $2 million in seed funding from investors including Brad Garlinghouse, Sky Dayton, Andy Bechtolsheim, Joi Ito, Maynard Webb, and Matrix Partners.

Diffbot has also made an infographic that shows off the data-mining potential of the new API with an analysis of Twitter links. Check it out below.


Homepage photo credit: Eye looking through peephole via Shutterstock

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.