Powerset, a San Francisco search engine company, will announce Friday it has won exclusive rights to significant search engine technology it says may help propel it past Google.
The technology, developed at Palo Alto Research Center (PARC) in Silicon Valley, seeks to understand the meanings between words, akin to the way humans understand language — and is thus called “natural language.” It has been thirty years in the works.
The deal is significant because practical use of linguistic technology has eluded Google. The giant search engine has said it wants to implement language-understanding technology one day. However, tests of linguistic approaches haven’t made any difference in Google’s results so far, it says (see our Q&A with Google Director of Research Peter Norvig below; also see his speech last year about this at Berkeley). Google has shunned reliance on word meanings, instead focusing on finding the most popular pages that contain the keywords. As for relationships between words, Google relies on statistical relationships, such as frequency they appear together, but not on linguistic relationships.
The deal with PARC, which is owned by Xerox, is Powerset’s answer to its critics, such as search expert Danny Sullivan, who all but heaped scorn on Powerset’s ambitions when we first wrote about them. At the time, Sullivan didn’t know the degree to which Powerset has focused on this.
The move is significant because Google’s own technology, based on “page rank,” has been virtually replicated by other search engines like Yahoo and MSN, and so isn’t as difficult to emulate as it was a few years ago. Powerset could possibly steal a lead if it improves search results by a significant measure with natural language and simultaneously incorporates a near-equivalent to Google’s existing capabilities. Powerset has been hiring lots of Yahoo search experts and others, to help it do that.
We’d be surprised if Google doesn’t scrutinize Powerset closely, perhaps even consider an acquisition (although in our Q&A today with Norvig, below, he says Google is now working on natural language after all). Until now, though, Google’s disciplined focus on a statistical approach may have blinded it to the possibilities of a linguistic approach, Powerset’s executives say. Powerset plans to launch the search engine publicly this year.
Powerset’s initial talks with PARC last year were enough to convince two well-known Silicon Valley venture capital firms Foundation Capital and the Founders Fund to invest in Powerset at a very high price. The firms and other individuals invested $12.5 million, and own less than a third of the company in return.
The venture capitalists made the investment based on an assumption that Powerset would complete the licensing deal. Negotiations on the deal, just completed, were so secretive that Powerset’s executives hid a Xerox PARC scientist, Ron Kaplan, in a back room when VentureBeat stopped by for an interview last year. Kaplan, who has led the “natural language” group for several years, joined Powerset as chief technology officer in July. This is a coup for Powerset, because Kaplan did not respond to some early probes from Google. In an interview, Kaplan said he didn’t believe Google took natural language seriously enough. “Deep analysis is not what they’re doing. Their orientation is toward shallow relevance, and they do it well.” Powerset, however, “is much deeper, much more exciting. It really is the whole kit and caboodle.” While natural language has been a vexing problem for decades, Kaplan said he believes it is ready for prime-time.
Chief executive Barney Pell approached Kaplan in Sept. 2005, and convinced Kaplan to help make a prototype search engine. Over time, Pell negotiated with Kaplan to bring his entire PARC research unit to bear on the problem.
Powerset’s license of PARC’s technology covers the broad areas of consumer search and published content. In return, Powerset will pay PARC a royalty fee, which is capped at an undisclosed level, and other compensation to PARC for the employment of its researchers on the Powerset project. PARC also gets an equity stake in Powerset. Powerset has the right to offer jobs to the PARC employees, if it wants.
Powerset has picked off a dozen high-profile search experts from Yahoo and elsewhere. Unfortunately, it revealed their names to VentureBeat only on condition we not publish them. One name now public is Tim Converse, a Yahoo Web spam expert. Powerset now has around 40 employees.
Powerset let VentureBeat see a limited demo of the Powerset technology this week, and we were impressed: To recap, VentureBeat’s earlier descriptions of Powerset’s technology here and here used the example of search “Who acquired IBM?” Google will give you lots of results about companies that IBM acquired, even though that’s not what you asked. Powerset, on the other hand, will give results of the companies that acquired IBM units, including Lenovo, and AT&T. Moreover in our demo this week, Powerset showed it can answer more complex questions, such as “Who did IBM acquire in 1996?” Here, Google completely breaks down. Better, the technology appears to learn over time. Powerset can use abstractions. For example, it scans the Web and finds that Hillary Clinton is associated with words like “liberal” and “democrat” and “leader.” So later, when you ask Powerset “What do liberal democrats say about healthcare policy?,” it will be smart enough to include what Hillary Clinton has said about healthcare policy, among other liberal democrats.
Clearly, Powerset faces challenges. Even if its technology does prove to be useful, it isn’t clear how long it will keep any lead (in natural language) in the face of an onslaught from Google. Another challenge is changing peoples’ search behavior, which is used to keyword searches. [Update: For a healthy dose of skepticism, see the NYT story too].
[Update and response to Techcrunch criticism here.]
************************
Q&A with Peter Norvig, Director of Research, Google
(Note: This interview is based on an email exchange Thursday afternoon, before Powerset’s announcement. Peter did not know the specifics of Powerset’s announcement, because Powerset had requested we not disclose it until Thurs. evening. Google’s policy is not to comment on competitors).
Tags: co:powerset, people:Ron-Kaplan, Xerox-PARCVentureBeat: Peter, you’ve been been critical of linguistic approaches to search (semantics search), and VentureBeat pointed to your speech at Berkeley about this in a past post. I’m wondering whether there’s been any change recently in your thinking?
Norvig: I would characterize my opinion on semantics search as realistic. My position is: we do what’s best for the users now, and over the longer-term, we investigate technologies that will help in the future. It would be great if we understood every word of every document and every query, but that’s a long way off. In the meantime, we develop technologies that provide the best overall user experience.
For example, fill-in-the-blank search (see here). You mentioned the query [Who shot Cheney] and said the answer should be “nobody” but I think our answer to [* shot Cheney], where the * means fill-in-the-blank, is better (see here). These results tell you about “an incident when another hunter shot Cheney - years ago” as well as speculation about “Suppose this 78-year-old man accidentally shot Cheney?” We think that the level of understanding we get from things like fill-in-the-blank today is useful, and we will keep doing things like that. We will also do more with question answering (see here; it gives a factual answer at the top of the page).
I have always believed (well, at least for the past 15 years) that the way to get better understanding of text is through statistics rather than through hand-crafted grammars and lexicons. The statistical approach is cheaper, faster, more robust, easier to internationalize, and so far more effective.
VentureBeat: On what basis did you decide that natural language wasn’t going to help? I.E. what sort of help did you get to make that determination? Were these conversations with natural language experts? If so, who?
Norvig: This wasn’t a decision I made, I was just reporting on results of what has worked so far. I have no theoretical stance on this.
VentureBeat: Has Google hired anyone yet to focus on the possibility of using symbolic/deep approaches?
Norvig: Google has several teams focused on natural language and dozens of Googlers with a PhD in the field, including myself.
VentureBeat: Have you ever talked with the natural search folks at PARC, i.e., Ron
Kaplan, to see whether anything his team has developed is worth integrating into Google?Norvig: I know Ron and the other natural language people at PARC very well. I worked for a summer with PARC’s Martin Kay and co-authored a book with him.
There are a handful of people at Google who worked at PARC on natural language in the past. And Barney worked for me at my previous job at NASA, so we talk often as well.
VentureBeat: What if a competitor (Powerset/Hakia) were to license that PARC technology out from under Google’s nose? Realize you don’t comment on competitors, so asking generically (about any company). What if this company acquired full and exclusive rights to PARC’s technology. Would that concern you?
Norvig: We feel there is a lot to do in the field of search, with many ways to approach it. Search remains at the core of everything Google does and we are always working to improve it.
32 Comments
-
Ryan said:
I have never seen a company get so much hype and not even have a finished product.
-
keanu said:
I think the most challenge powerset will face is scalability. for english search, maybe powerset has found the right way, but for chinese, japanese, or other languages, will the technology still work well? from this point, google’s technology is easier to solve multi-languages’ searching.
-
You Mon Tsang said:
Having spent many years making practical use of search, I agree with Norvig. That said, there will come a time when NLP will work well. I just would not bet on it in the next 5 years.
-
Emre Sokullu said:
What’s this hype about Powerset! We can’t even test it, let us see a demo or at least some sample search queries & results…
-
ashley said:
This is the third time I visit your blog and I like your articles. If you guys love gadgets, please visit my blog at: http://www.funnygadget.com Thanks!
-
Search Engines WEB said:
__________________________
What do liberal democrats say about healthcare policy?
Presumably the current SERPs returned by Google would have pages that would have incorporated Hillary Clintons response on Health Care
A general query would probably return web pages encompassing virtually all high profile liberal Democrats
-
Ribin said:
-
Matt Marshall said:
Re other languages, good point. Powerset said it mindful of this, but like so much of its operations, its exact progress here is secretive — so we just don’t know. It’s certainly a major challenge.
-
Nathaniel said:
It seems like you are putting a huge value on natural search and presuming that it *is* the holy grail. Although there is huge potential, it seems presumptuous to assume that users will prefer it over current solutions and even more presumptuous to assume that the technology from PARC is the best of breed in the space. IT is especially hard to appreciate the value without seeing anything (as pointed out by an earlier poster) and as mentioned by Norvig there are numerous ways to approach Natural Search indicating that having the PARC technology licensed “out from under Google’s nose” didn’t have them losing sleep at night.
Once the technology is released to the public and there are meaningful results, I will be the first in line to try it out and listen to customer reactions, but it seems a bit premature to knock a few points off of Google’s stock price.
-
John Ebbert said:
Interesting info - and a competitor to Google in search is good, no doubt. I am not convinced until I see the product, though.
-
Thomas Hawk said:
If their technology really works, which is a big if of course (remember when Riya claimed to do facial recognition, now they sell shoes), then I’d think that Yahoo, Microsoft or ICA should be even more interested in acquiring the company than Google.
-
Blake Carrington said:
Google has a huge natural language team, as does Yahoo, but the challenge they have is a user base trained on using two and three word search strings. Even if they added awesome NL, no one would use it.
That said, as “great” as Google’s results are now, I have no doubt that in a few years, people will look back and think about how primitive the page rank idea is. -
Matt Marshall said:
Thanks, on typo. Fixed.
-
Steve Morsa said:
As promising as their natural language platform sounds; and it does; the greatest threat to Google’s growing hegemony in the search/paid search arenas…given that about 1/2 of all searches are known to be for products and services…may actually spring from patent pending (#11/250,908) paid match, which will target people’s actual demographic and psychographic traits and characteristics (keytraits) instead of just the words we all type into little search boxes.
Though, like Powerset’s, paid match is not yet an operating system, our own US Dept of Labor does run a very popular service (over 500,000 users/month) which provides an enlightening and instructive peak at the potential that such a paid match search/ad platform possesses.
Called GovBenefits (available at govbenefits.gov), it utilizes a personal profile and a match engine to determine what government benefit programs people qualify for.
Were such a system populated with the 100’s of thousands to millions of products and services companies provide nation/worldwide instead of just the 400-odd government programs it includes now, one can only imagine what its public popularity would be…
…and with the world’s advertisers having the ability to pinpoint target and control; via bidding directly on those keytraits most relevant and applicable to their products and services, exactly who sees their ads (goodbye click fraud); one can also only imagine the deleterious effects that such an elegant and superior system/platform would have on a 95% PPC income dependent company like Google…
-
ronald said:
Who acquired IBM?
I didn’t know IBM was acquired.
Or what’s the difference between my company acquiring a Computer from IBM and and somebody else acquiring a business unit?
Now if I make up an article about my company acquiring IBM, will this article turn up as a result. Page Ranking would maybe list it as the 1m link, since nobody will believe it and link to it.
In other words NLP without shared context is just random sentences used in a conversation. Or you have a lot of explaining to do about what you mean and what you trust. Or, for a right wing nut, liberal has a totally different meaning then for a left wing nut. So while Page Rank includes some of the wisdom of the crowd how does NLP do that. Language is very subjective.In other words your client system has share your context with the server, if you want to stay sane and get some things done. But we are are a few years away from pretty pictures in the UI to context driven machines. But less then 5 years if somebody would ask me, nobody does by the way. Don’t know why :-).
-
Sean Wilson said:
Interesting article. NL will be a useful tool, but as the amount of data available on the web grows faster over time, I think it will be even more useful in limited contexts instead of for broad-based search engine applicatons.
Personally, I’d like to see a market for elite managed directories that those listed pay to participate in with an account manager handling a couple hundred sites…all of which are highly relevant and vetted for a particular niche. A hundred such niches in a directory that a business or journalist could subscribe to and get quality content, answers, and results that would serve not only as quality research but even immediately actionable intelligence would be kind of cool.
Combine that with Yahoo! Pipes and you could make for a powerful model. The problem isn’t with search technology so much as it is with content quality. While it might sound a bit retro, I wouldn’t mind seeing a return of edited directories combined with the new NL technology. I’d pay a subscription fee for vetted results, and depend on Piped data that met very specific requirements for outside information and news from XML feeds.
Some day when Google bombs are gone, I might buy into the notion that search is anything to lose sleep over. Word of mouth and social networking are still more intriguing than search to me, perhaps because I still put more creedence in WOM than any other marketing tool. Pipes might help change the status quo for mining data effectively from RSS feeds. So might this NL effort by Powerset do the same for search. I won’t hold my breath hoping.
Anyway…great post.
-
tbee said:
don’t Autonomy, the British software company, have natural language search? they claim to have meaning-based computing. consumers largely see it through Blinkx, which licenses their tech. otherwise it’s mainly a business thing. but still, aren’t there going to be some patent infringement issues here?
-
Tim said:
“…search engine technology it says may help propel it past Google.”
Propel it past Google in what way? In that they deliver better results? In that they’ll have more users? In that they’ll make more money?
I could say that MSN has better search relevancy than Google, but does that mean everyone’s going to switch tomorrow?
Google is a powerhouse because of their advertising. Their search relevancy is just good enough. What sort of money printing machine is Powerset building? There’s no mention of it.
Ronald’s point about shared context is extremely relevant. I would say Google’s answer to this is personalized search. It’s just in its infancy, but it’s a start. I would say at least, or probably more important in maintaining users is establishing that relationship. How long will it take to establish a relationship with Powerset, so it understands what the heck you’re asking it? What’s in it for users to give it that chance?
-
Michael Belanger said:
Powerset is a late comer and far behind others in the NLP search tech space. NLP has always thrown away context to fit SQL database calls. A fundamentally new database architecture is required (Patents filed as early as 1994) to use every scrap of context expressed by well articulated needs (query). You can experience an award winning NLP enterprise search offering (activated in 2005) at Boston’s Children’s Hospital’s Center for Media and Child Health - http://www.cmch.tv - go to their “research” page and experience “Smart Search.” This NLP engine encourages (for highest precision) an everyday conversational query of unlimited length and complexity including “user jargon” of ten social science professional domains.”
The next and final (post Google/Powerset) achievement in breakthrough user experience will be Jarg Corporation’s Semantic Knowledge Indexing Platform (SKIP) launch mastering “NOP” Natural Object Parsing that co-populates “well-understood native object content fragments” in the same master index with NLP-graph fragments. This final step - using conversational style requests (over a cell phone or keyboard) will provide total information awareness associated with the “roll” of the user - as derived on the fly from the full context of the request’s information needs. Only relevant knowledge will be considered and the more contexts in the request - the more highly personalized will be the returns-ranking. These returns will be a “collage,†ranked by fit-to-context, of image segments, fragrances, text, structure segments, music segments and all forms of knowledge with precise contextual relation to your on the fly the needs – fit to your “user’s roll†of the moment. Jarg will be seeking its very fist institutional capital starting in March 2007. Jarg has incorporated Semantx Life Science, Inc. Care Commons, Inc and Preemptive Alert Corporation to become best of breed in their verticals. -
John said:
Don’t confuse “natural lauguage” and “linguistics”. They are quite different.
-
Mihran Shahinian said:
They also assume that users are going to type in full nlq sentences.
From the experience most end users put 2-3 keywords inside the search box. -
Kind And Thoughtful said:
This is a great country. Ideas can come from anywhere. They may be instantaneous or take years to develop. Then, with the persistence and financing, they are brought to the world.
-
Bob New said:
My all time favorite search story is about when I was researching a programming problem. I had typed in a line of compiler generated assembly language that had generated an error. The first page returned by the search contained the answer I was looking for.
Of course being taken to the exact page was the result of entering a very specific query. I had entered a fairly long list of query terms each of which have pretty low frequency in the total web content. (And of course I was lucky that that page did in fact exist).
To me the value in search is finding the one single page that has the exact detail you are looking for. In my example above, this occurred because of the number of explicit search terms I had entered. But most often, I have to manually search through many pages until either I find what I am looking for.
Eventually search capabilities will improve to give users more targeted search results. By targeted, I mean a better threshold for what is relevant to the query than just “this page contains the words you asked for”. Eventually the search threshold will be “this page contains the MEANING you asked for”.
In time, linguistics will be applied to help improve search capabilities. Things like word-sense disambiguation, word synonyms and sentence structure will likely all play in this.
-
Caribbean Guy said:
There are three major problems with this approach. First, it is technically difficult to do it well. Second, it isn’t necessarily going to be more appealing to users, because they need to use more keystrokes. Third, it tends to create a higher expectation level for users, so users will tend to be more critical of the results — any failure to correctly interpret and answer their question tends to be very noticeable. In contrast, the simplistic keyword approach takes less typing, it is easier for the Search Engine to accomplish, and because of the crude manner in which the words are typed into the box, users tend to dismiss failures as an indication that they failed to type the right keywords.
-
Vivek Juneja said:
Hi,
I believe in what the above person says. That has been the major bottleneck when developing any NLU engine. I myself had developed a NL Interface to Operating System sometime back, it worked correctly till Users started expecting more and the minor mismatches were blown out thus reducing the confidence of people in the System. -
Enrique Torrejon said:
Hi,
Regarding the hype of PowerSet and the NLP technology they are developing, it is not that new!
At Bitext, we have been developing natural language search engines for more than 4 years now. These engines allow users to query in natural language to any computer application, including the web.
If you would like to see how it works, you can visit our demo of NaturalFinder integrated with MSN Search for English at http://demos.bitext.com/MSNen/frames.htm
Our technology can be easily integrated with any search engine like Google Search Appliance, Autonomy, dtSearch, Lucene, etc.
For more info, do not hesitate to contact us!
Good day to all!
-
lisa lisa said:
I wasn’t aware of any big NL groups at Yahoo when I worked there. Also, analyzing RDF triples, while progress, isn’t exactly “deep” linguistic analysis although there are now db technologies (oracle, franz) who are supporting large data stores which will help inch this technology along.
-
Joe Duck said:
As Don dodge noted recently you only need 1% of the search market to do very well. Powerset is poised to do well, and could even be the “killer search application” though it’s too early to say if they can beat Google at the game Google has played so well for many years.
-
Prakash pimpale said:
I don’t agree with the Norvig completely as he says Natural language technology will not be a practical in somming time. I agree with the strength of the statistical approaches used for the natural languages processing, but they can’t stand alone.
Now take an example of the Machine translation system by the Google (specifically I will talk of English to Hindi), its statistical but doesn’t do any sensible translation. On other hand you can see a English to Hindi machine translation system called MaTra doing much better than it. You can try this here
http://202.141.152.9/matra/index.jsp . This seems to be much better than Google’s. And so I think guys at Google shouldn’t Ignore the other growing giants, as today only I read Dianosor vanish one day…!
2 Trackbacks
2:01 pm
Microsoft to buy semantic search engine Powerset for $100M plus » VentureBeat said:
[...] response to our question about how Google is planning to use semantics, if at all, Google gave a very similar answer to the one Google’s Director of Research Peter Norvig provided last [...]
8:27 am
Geekovation » Blog Archive » Yet another “Microsoft acquires Powerset” blog said:
[...] acquisition can add value in making vertical search offerings from live.com smarter and probably scaring the Mountain View behemoth. Whether MSFT takes Powerset’s saplings and nurtures it into its Redmond forests - this only [...]