Updated

kosmix2.gif

If there’s anyone itching to take on Google, it is the two Indian guys who went to Stanford with Google co-founders Larry Page and Sergey Brin.

Meet Anand Rajaraman and Venky Harinarayan, two of the co-founders at Junglee, and who twice seriously considered acquiring Google in its early days, but decided their friend Brin was too bold, if not arrogant, to deal with.

venky.jpg
Venky

Now they plan to officially launch an ambitious search engine company, Kosmix at the Demo conference to begin the week of Feb 6 in Phoenix. They’ve also raised $7.4 million in venture capital.

They are making an audaciously risky bet that they can crack the code on a vexing problem in search: finding the meaning, or at least the topic of a Web page. “This is an unsolved problem on the Web,” says Harinarayan, from his office perched on the seventh floor of a Mountain View high-rise. His window commands a sweeping view of the valley, stretching out over toward the Googleplex, just three miles away.

It’s as though Harinarayan is still keeping a eagle eye on his erstwhile Stanford buddies.

anand.jpg
Anand

More later on the friendly but competitive relationship with the Google guys. For now, Kosmix is betting its deep technology can help improve upon Google’s one-size-fits-all approach for many types of…

searches. Google may work well when you’re looking for a specific answer. But what if there’s no one right answer? This is where Kosmix wants to help you, by searching the entire web and narrowing the results to the particular area you are interested in — and then giving you a choice of answers.

Kosmix isn’t the first to latest to help users search by topic. You’ve got things like Become.com for shopping. You’ve got got Mobissimo for travel, Trulia for real estate and Healthline for health, and so on.

But Rajaraman and Harinarayan claim most of these other sites crawling only 500 or so Web sites relevant to their niches. Kosmix, like Google and Yahoo, is crawling and indexing the entire Web. It has come up with its own technology to rank pages by category, instead of by keyword.

Let’s take an example Harinarayan gave us. Say you are suffering from ACL, a common knee injury prevalent among skiers. Type in “ACL” into Google, and you get mostly irrelevant pages. Try narrowing your search by typing “ACL knee” and you still get quirky results like one from www.financeprofessor. You might eventually find some good pages, but you’re often at a loss for what else is out there on your topic.

So type in ACL into Kosmix’s health engine, and you’ll get relevant pages straight off, but also a helpful categorization of results along the left-hand column, for example: “definition,” “causes,” “treatments,” and “blogs” and “message boards.” Harinarayan doesn’t mind boasting: “What you can get with five minutes at this site, is a hundred times what you can get at Google.” It even provides a category for alternative medicine. Harinarayan remarks: “You wouldn’t even know to ask about that at Google.”

Here’s the ACL example (click to enlarge):

To organize its results, Kosmix doesn’t use pagerank — or popularity, based on the number links to a page. Kosmix decided pagerank is inefficient when it comes to categories. “There is no affinity to topic, when you are ranking by raw popularity,” says Harinarayan.

Instead, Kosmix looks at what pages that link to other pages are saying — to take a bigger stab at judging the meaning or subject of the page. If the linking page is saying something similar to the page it links to, you can begin getting at its meaning, or at least muster up enough information to categorize it by topic. Harinarayan calls it “category rank.” Kosmix is essentially tagging pages with categories. “Auto-tagging the Web,” as Harinarayan puts it.

Kosmix has started with a health search, but will soon roll out travel and politics search, and will follow with a rolling thunder of scores of other types of searches, Harinarayan says.

This is not lightweight stuff. They have filed patents, and there’s tons of math in their algorithms, Harinarayan says. They’ve hired 20 people, including PhDs from Stanford, experts from IBM Almaden, and their chief technology officer and vice president of biz development both come from Yahoo. They’ve been working away for about a year.

They’ve raised $7.4M from Lightspeed Venture Partners and Cambrian, which is their own venture firm. They had an earlier seed round of $700K, which includes money from Amazon’s chief executive, Jeff Bezos.

(Update: Business model? Same as Google. Kosmix will run ads along the right-hand side, which it will start running more actively after the launch at Demo. Right now, it is running Google adwords.)

Harinarayan concedes Kosmix doesn’t have all the answers. “We’ve taken the first real leap at solving this problem,” he said. “We haven’t solved it entirely.” And there’s the conundrum of figuring out how to present all the “flavored” search options to users, once Kosmix rolls them out over the next few months, Harinarayan says.

It has been a twisting road for this duo, and the Google guys have always been there to taunt them. As we reported once before, Harinarayan shared an office with Brin while at Stanford, and remembers thinking he was one of the smartest guys at Stanford. Early on, Rajaraman’s cubicle in the common study area, known as the ”Zoo,” was next to Brin’s. In 1994, Rajaraman proudly told Brin he’d acquired a new computer with the latest version of Microsoft Windows. Brin said Microsoft was ”lame,” went over to Rajaraman’s apartment and installed Linux — a free open-source operating system then almost unheard of — on his computer.

Brin even took on Rajaraman’s practice of eating vegetarian, a family tradition. One evening, Brin went over to Rajaraman’s apartment, baked a fish in his oven, and served it to him with some lemon. Rajaraman ate it.

And the tough thing is, Brin was the young guy, the whippersnapper: “The joke was, we couldn’t go to bars with him, he was underage,” Harinarayan recalled during an interview a few years ago.

But the thing that may really stick in the Kosmix guys’ craw is how they almost merged with the Google guys in the early days. Rajaraman and Harinarayan were co-founders of Junglee, an early Web database company, and the Junglee guys were considered role models — they were the first in their Stanford department to launch an Internet company.

Harinarayan once mused to us: “I wish we’d gotten Sergey into Junglee.” It was close. They’d wanted to approach Sergey Brin, who was in early days forming Google with Page, and considered talks to acquire Google. Their advisor, Jeff Ullman, a professor at Stanford, had suggested it. But they ever got very far because Brin was so brashly confident.

But they got their opportunity again when the Junglee guys, having been acquired by Amazon, came down in 1999 to talk more seriously with Google about an acquisition.

Rajaraman recalls how Google was still not very big, employing only around 50 people: “And we kind of asked, at that point, ‘Sergey, if Amazon were to buy you guys, what sort of price would you sell for?” I remember Sergey telling me: ‘The only kind of price we’d accept would be something with ten digits [billions].’ If he’d said nine digits, we might have talked.”

Oops.

Anyway, after Amazon, the Rajaraman and Harinarayan raised a small venture capital fund in 2000, and invested in several companies from their Mountain View office. They scored at leat one big hit with Neoteris, and have several others remaining in their portfolio — which they continue to manage. But after investing most of that first fund, they’ve decided not to raise another one, having been bitten by the bug last year to launch Kosmix, Harinarayan explained.

————–

Update: Here’s some initial feedback from the experts.

Gary Price, over at Search Engine Watch says he hasn’t looked at Kosmix yet, but he’s looked at clustering efforts. Determining the “aboutness” of a web page, book or article is something that librarians and catalogers have been “struggling with and writing about for years,” he notes.

Perhaps the biggest challenge that Kosmix has — and it is shared by similar theme-oriented engines such as Clusty, FirstGov, AskJeeves with their “Zoom” feature, etc — is to educate its users. “It’s cool stuff, but unless you know how to use it then it’s a waste of space,” Price says. “That said, as a search trainer, a little education goes a long way.”

As for Kosmix, Price says it looks similar to what Northern Light was doing.

They had a staff of knowledge experts and librarians (I was a consultant) building the vocab and then letting the tech map the pages to the right category(s). Depending on the specificity of the vocab, building and maintaining one is a huge job. However, I for one am a strong believer in them for specialty tools. One thing they do need to do is offer what are called glosses, definitions of what different categories mean.

Danny Sullivan, Gary’s colleague, expresses some skepticism:

Overall, we think clustering is cool — but sadly, to date users aren’t really responding to it. They’ll just look at the main results. And while ACL at Kosmix looks great because they’ve told you to think about it in terms of a particular injury, over at Google, you can see it might actually mean other things.

That means to grow and really compete with Google, they have to show you a wide range of matches then hope you will narrow down into topical areas. But if those matches aren’t good off the bat (where popularity ranking can help), then they’ve got problems.

Overall, I won’t be surprised if the technology helps them roll out some high quality verticals. I’d be very surprised if they jumped in to really oust any of the major search players.

Charlene Li, of Forrester Research, just got back to us, after sitting down with the Kosmix folks. She echoes Sullivan’s view that Kosmix might be well placed to offer some good topical search engines, because Google isn’t focusing on that. “I think they’re going to do great,” she said.

On the other hand, Kosmix probably won’t succeed in becoming a more general search, she said. “I don’t agree with them on that,” she said. “If they remain focused on verticals, they’ll have a fighting chance.” She also refers to the user-education problem. If you roll out too many specialized searches, a user gets paralyzed. If they’re looking for information on baby shoes, will they look at Kosmix’s health search, its fashion search, or “caring for kids” search?