powersetlogo.bmpPowerset, a San Francisco search engine company, will announce Friday it has won exclusive rights to significant search engine technology it says may help propel it past Google.

The technology, developed at Palo Alto Research Center (PARC) in Silicon Valley, seeks to understand the meanings between words, akin to the way humans understand language — and is thus called “natural language.” It has been thirty years in the works.

The deal is significant because practical use of linguistic technology has eluded Google. The giant search engine has said it wants to implement language-understanding technology one day. However, tests of linguistic approaches haven’t made any difference in Google’s results so far, it says (see our Q&A with Google Director of Research Peter Norvig below; also see his speech last year about this at Berkeley). Google has shunned reliance on word meanings, instead focusing on finding the most popular pages that contain the keywords. As for relationships between words, Google relies on statistical relationships, such as frequency they appear together, but not on linguistic relationships.

The deal with PARC, which is owned by Xerox, is Powerset’s answer to its critics, such as search expert Danny Sullivan, who all but heaped scorn on Powerset’s ambitions when we first wrote about them. At the time, Sullivan didn’t know the degree to which Powerset has focused on this.

The move is significant because Google’s own technology, based on “page rank,” has been virtually replicated by other search engines like Yahoo and MSN, and so isn’t as difficult to emulate as it was a few years ago. Powerset could possibly steal a lead if it improves search results by a significant measure with natural language and simultaneously incorporates a near-equivalent to Google’s existing capabilities. Powerset has been hiring lots of Yahoo search experts and others, to help it do that.

We’d be surprised if Google doesn’t scrutinize Powerset closely, perhaps even consider an acquisition (although in our Q&A today with Norvig, below, he says Google is now working on natural language after all). Until now, though, Google’s disciplined focus on a statistical approach may have blinded it to the possibilities of a linguistic approach, Powerset’s executives say. Powerset plans to launch the search engine publicly this year.

Powerset’s initial talks with PARC last year were enough to convince two well-known Silicon Valley venture capital firms Foundation Capital and the Founders Fund to invest in Powerset at a very high price. The firms and other individuals invested $12.5 million, and own less than a third of the company in return.

The venture capitalists made the investment based on an assumption that Powerset would complete the licensing deal. Negotiations on the deal, just completed, were so secretive that Powerset’s executives hid a Xerox PARC scientist, Ron Kaplan, in a back room when VentureBeat stopped by for an interview last year. Kaplan, who has led the “natural language” group for several years, joined Powerset as chief technology officer in July. This is a coup for Powerset, because Kaplan did not respond to some early probes from Google. In an interview, Kaplan said he didn’t believe Google took natural language seriously enough. “Deep analysis is not what they’re doing. Their orientation is toward shallow relevance, and they do it well.” Powerset, however, “is much deeper, much more exciting. It really is the whole kit and caboodle.” While natural language has been a vexing problem for decades, Kaplan said he believes it is ready for prime-time.

Chief executive Barney Pell approached Kaplan in Sept. 2005, and convinced Kaplan to help make a prototype search engine. Over time, Pell negotiated with Kaplan to bring his entire PARC research unit to bear on the problem.

Powerset’s license of PARC’s technology covers the broad areas of consumer search and published content. In return, Powerset will pay PARC a royalty fee, which is capped at an undisclosed level, and other compensation to PARC for the employment of its researchers on the Powerset project. PARC also gets an equity stake in Powerset. Powerset has the right to offer jobs to the PARC employees, if it wants.

Powerset has picked off a dozen high-profile search experts from Yahoo and elsewhere. Unfortunately, it revealed their names to VentureBeat only on condition we not publish them. One name now public is Tim Converse, a Yahoo Web spam expert. Powerset now has around 40 employees.

Powerset let VentureBeat see a limited demo of the Powerset technology this week, and we were impressed: To recap, VentureBeat’s earlier descriptions of Powerset’s technology here and here used the example of search “Who acquired IBM?” Google will give you lots of results about companies that IBM acquired, even though that’s not what you asked. Powerset, on the other hand, will give results of the companies that acquired IBM units, including Lenovo, and AT&T. Moreover in our demo this week, Powerset showed it can answer more complex questions, such as “Who did IBM acquire in 1996?” Here, Google completely breaks down. Better, the technology appears to learn over time. Powerset can use abstractions. For example, it scans the Web and finds that Hillary Clinton is associated with words like “liberal” and “democrat” and “leader.” So later, when you ask Powerset “What do liberal democrats say about healthcare policy?,” it will be smart enough to include what Hillary Clinton has said about healthcare policy, among other liberal democrats.

Clearly, Powerset faces challenges. Even if its technology does prove to be useful, it isn’t clear how long it will keep any lead (in natural language) in the face of an onslaught from Google. Another challenge is changing peoples’ search behavior, which is used to keyword searches. [Update: For a healthy dose of skepticism, see the NYT story too].

[Update and response to Techcrunch criticism here.]


norvig.bmpQ&A with Peter Norvig, Director of Research, Google

(Note: This interview is based on an email exchange Thursday afternoon, before Powerset’s announcement. Peter did not know the specifics of Powerset’s announcement, because Powerset had requested we not disclose it until Thurs. evening. Google’s policy is not to comment on competitors).

VentureBeat: Peter, you’ve been been critical of linguistic approaches to search (semantics search), and VentureBeat pointed to your speech at Berkeley about this in a past post. I’m wondering whether there’s been any change recently in your thinking?

Norvig: I would characterize my opinion on semantics search as realistic. My position is: we do what’s best for the users now, and over the longer-term, we investigate technologies that will help in the future. It would be great if we understood every word of every document and every query, but that’s a long way off. In the meantime, we develop technologies that provide the best overall user experience.

For example, fill-in-the-blank search (see here). You mentioned the query [Who shot Cheney] and said the answer should be “nobody” but I think our answer to [* shot Cheney], where the * means fill-in-the-blank, is better (see here). These results tell you about “an incident when another hunter shot Cheney – years ago” as well as speculation about “Suppose this 78-year-old man accidentally shot Cheney?” We think that the level of understanding we get from things like fill-in-the-blank today is useful, and we will keep doing things like that. We will also do more with question answering (see here; it gives a factual answer at the top of the page).

I have always believed (well, at least for the past 15 years) that the way to get better understanding of text is through statistics rather than through hand-crafted grammars and lexicons. The statistical approach is cheaper, faster, more robust, easier to internationalize, and so far more effective.

VentureBeat: On what basis did you decide that natural language wasn’t going to help? I.E. what sort of help did you get to make that determination? Were these conversations with natural language experts? If so, who?

Norvig: This wasn’t a decision I made, I was just reporting on results of what has worked so far. I have no theoretical stance on this.

VentureBeat: Has Google hired anyone yet to focus on the possibility of using symbolic/deep approaches?

Norvig: Google has several teams focused on natural language and dozens of Googlers with a PhD in the field, including myself.

VentureBeat: Have you ever talked with the natural search folks at PARC, i.e., Ron
Kaplan, to see whether anything his team has developed is worth integrating into Google?

Norvig: I know Ron and the other natural language people at PARC very well. I worked for a summer with PARC’s Martin Kay and co-authored a book with him.

There are a handful of people at Google who worked at PARC on natural language in the past. And Barney worked for me at my previous job at NASA, so we talk often as well.

VentureBeat: What if a competitor (Powerset/Hakia) were to license that PARC technology out from under Google’s nose? Realize you don’t comment on competitors, so asking generically (about any company). What if this company acquired full and exclusive rights to PARC’s technology. Would that concern you?

Norvig: We feel there is a lot to do in the field of search, with many ways to approach it. Search remains at the core of everything Google does and we are always working to improve it.

