Until about a year ago, you were in an unenviable position if you wanted to build a web site that used or created semantic data: You pretty much had to build all your tools from scratch. Calais, a semantic web service being developed by information giant Thomson Reuters, is offering to help change that with a service it’s rolling out today.
The “semantic web” is a general term for the part of the Internet’s data that machines can understand and interpret. For example, if you for some reason wrote “wrench in the library”, a semantic application might be able to automatically link it to Colonel Mustard, or a history of wrenches, depending on the context.
Calais is a set of tools for creating and using that data. It lets you build services in areas like news, travel or advertising with its generalized platform — what its de facto CEO within Thomson Reuters, Thomas Tague, calls the “plumbing” of the semantic web.
Calais has been open since January, when it invited developers to come in and try it free of charge. Some first efforts have emerged since then using it, like Gnosis, a Firefox plugin that helps research data points like companies or people as you browse the web, and LinkedFacts, a demonstration of how semantic technology can enhance news browsing, but those developers built without any real guarantee that the service would be maintained by Thomson Reuters.
Today, Calais is providing some much-needed guarantees. While a free version, OpenCalais, will be kept for developers, companies and publishers will have access to subscription-based professional and enterprise versions. They’ll get enough processing power for millions of daily data “transactions,” as well as service and support. Most important, they’ll sign annual contracts that require Thomson Reuters to keep it running.
Those packages should give existing companies a green light to put serious resources into development atop Calais. Tague told me at a recent San Francisco meeting that the guarantees are timely — in the eight months Calais has been around, he said, adoption has moved from geeks who just wanted to play with the tools, to small publishers and startups, to large companies.
As to what will come out of Calais, it’s hard to say. One somewhat frustrating point when talking about any generalized semantic web platform is that it’s early days, and developers are still figuring out what they can do with the tools. There are some early ventures. One is Twine, which helps its users collect and organize large volumes of information, for easier search and discovery. There’s also Peer39, which provides contextual targeting for advertising, and TripIt, which automatically organizes your travel itinerary. Others, from BlueOrganizer to Zemanta, are popping up every month.
For Tague, the best semantic web apps have yet to be dreamed up by their creators. Some of the juiciest opportunities are around news, where publishers will soon be able to automatically create news subject hubs and link paths to more information with Calais; an early Reuters project called Gist does some of that. But other ideas may prove even better.
For the geeks among you, here’s a short list of some other tools that the Calais team is working on: First, it just released a tool to automatically generate semantic metadata for any existing page on the web. It’s also working on de-referenceable URIs, which helps define the characteristics of a resource located at a web address. And finally, it’s putting the finishing touches on its ontology, for public release later this year.