It appears 2008 might well be shaping up to be the year that semantic technology kicks off: Semantic search engine Hakia has begun licensing its technology, the intelligent organizer Twine is readying for launch, and now natural language search engine Powerset is also considering a near-term launch, as TechCrunch recently noted.
I’ve met with Powerset twice recently, and their progress even over that short timespan appears to have been considerable. A month ago two of Powerset’s founders, Barney Pell and Lorenzo Thione, were showing me how a new index (the data and rules that determine results for search engines) dramatically improved search results within Wikipedia, which Powerset uses as its testing ground.
They’re now mostly happy with the relevance of their search results and are working to build the features and interfaces that will determine how users interact with the engine. The San Francisco-based company has gone from refining its search indexing abilities to building out some fascinating tools that can parse, chop, mash-up and re-display the sentences and paragraphs that are crawled by its engine.
In our most recent meeting, Pell laid out Powerset’s new unofficial motto: “We’re not a search engine.” That’s not a surprising assertion, considering that all of the semantic startups have been trying to dodge the hyped-up “Google Killer” label since their inception. But it’s worth explaining exactly how Powerset, a company that wants to be used to search for information on the Internet, is not a search engine.
Say you’re searching on Google to learn about naval history. Here’s your problem: When Google returns its thousands of results, you actually have to go to web pages individually to see if they’re what you want. If they are, you then have to search through those pages for the information you want. If you want to know about a particular naval battle but can’t remember its name, the search could quickly become frustrating. A search for naval battles during the Civil War will be helpful, but it requires some effort to hunt through the results (try it yourself).
Powerset’s technology, however, can provide sets of results based either on entire web pages — as Google does — or on specific sections of those pages, which is helpful if they’re long, like Wikipedia entries. But where the company is headed is toward reading through pages for you and arranging or condensing the information it finds to just tell you an answer.
That means I could potentially ask Powerset, “What were the major naval battles of the Civil War?”, and immediately find in a list of results what I was looking for, the bizarre fight between the Monitor and the Merrimac. If I had to go to a page to search through the information, Powerset might have some tricks to aid in the search, pointing out sentences that seemed to be likely matches.
Figuring out the exact form those tricks will take is Powerset’s immediate challenge. I got to see a handful of tools and features, but only after swearing secrecy — in part because there’s no certainty as to what will be included with the company’s public release. Powerset faces the same challenge that Twine does or Hakia does, in that respect. Because it’s trying to create new ways of navigating the Internet, there’s no tried-and-true model to copy.
Before launching, Powerset will have to settle on a way to return data it thinks is of the most use in a search. And of course, there’s the challenge of processing millions of these requests at a time; whatever it does has to scale up for public use. Overcoming those challenges could drag out the launch date. (Some of Powerset’s team have vowed not to shave their moustaches until their public release, a project they’re calling Powerstache.)
In the meantime, Powerset will keep stressing that it’s no Google — simply because the team doesn’t want to invite derision by making grandiose claims. Yet that creates the question of whether the company could ever approach Google’s value. After all, it hasn’t yet shown that it can expand beyond Wikipedia. And what’s the point of getting excited about a company that’s basically a glorified research tool?
The answer is all in possibilities. Google is still the best way to hunt through vast numbers of silos (web pages) containing information when you’re looking for a specific fact. No new technology will seriously challenge that ability for a year or two, at least. But a technology like Powerset could short-circuit Google’s process by just giving you the damn fact, already instead of listing relevant websites.
We like Google for its speed and efficiency. We’ll like Powerset, or something similar, even more. A decade from now, we’ll look back and wonder how we survived with just Google; it’s a whole new ball game, one that will give us a whole new suite of tools indispensable for navigating the Internet. It remains to be seen how well Powerset will play into that future.
VB's research team is studying web-personalization... Chime in here, and we’ll share the results.