VentureBeat

Posts Tagged ‘co:Search-Wikia’

It seems that all major new search engines undergo a somewhat similar birth. For months before they’re seen, they’re hyped, and anticipation builds to a fever pitch. The phrase “Google Killer” is inevitably bandied about. Then they’re released … to mass disappointment. The crowd disperses, at which point the true work can begin.

That’s more or less what happened to Search Wikia, the commercial counterpart to Wikipedia. Search Wikia proposes to improve search results through direct human editing, rather than using fully automated technology like Google or the newer semantic and natural language search engines. Released this January, Search Wikia was almost universally panned for offering poor results (though we were a bit nicer).

A new release today is adding a slew of features to the engine (some of which we previewed back in April). Users can now edit results extensively. They can modify the title and summary (seen below), add pictures or content from the page the result leads to, write comments, delete or hide results, or if they’re feeling less motivated, just give a star rating. All changes are closely tracked and logged. Helpfully, Search Wikia is also providing a quick way to switch to other engines if the results of a search aren’t good.



Those features address some pain points initial users had. Search Wikia’s initial editing tools were conceptually descended from Wikipedia, but Wikipedia isn’t known for ease or speed of editing.

But there’s a big caveat to all the new features: They only tangentially address the issue of search result quality. For that, there are only two real factors: The underlying search technology (i.e. Google-style automation), and human sweat equity. New tools may attract more users or make their work more effective, but in the end, Wikia just needs time to mature — not always an abundant resource in the fast-moving landscape of the Internet.

That’s why Mahalo, another human-powered search engine, has modified its approach several times, moving its focus to comprehensive pages built around subject areas — a strategy that appears to have led to faster growth. However, when I talked to Wikia (and Wikipedia) founder Jimmy Wales about Search Wikia, his faith in the idea of social search didn’t seem shaken by the experience of launching the product.

Wales says the next step for Wikia is working more on the automated back-end, which is based on the open-source Nutch and Lucene search engines. Following this release, the company will turn its focus away from user tools to customizing its internal tools and growing its index, which stands at around 30 million pages.

Another focus will be adding a “widget framework” for people to do specialized searches. Wales gave the example of searching for a zip code and coming up with weather results — an scheme that sounded a bit like the search-by-vertical approach that semantic engine Hakia started this year.

However, having seen the efforts of Hakia, Powerset and other new engines, Wales says he’s confident that social search is still the way. “I haven’t seen a lot of cases where [semantic search] improves anything anyone actually cares about. The right approach to search is to let computers do what computers do well, and humans do what humans do well,” he told me.

You can try out the new Search Wikia here. Let us know what you think in the comments, below.

When the guy behind Wikipedia launches a search engine, the world is going to watch. And watch they did when Jimmy Wales unveiled Search Wikia in January — perhaps a little too closely. I say that because while some were expecting to see a “Google-killer“, the site we saw was a bare-bones engine in the very early alpha testing stage.

But now, it’s getting closer.

I got a chance to play with some of the upcoming changes coming to Search Wikia. Those hoping for a more Wikipedia-style approach to search results will not be disappointed. You can test some of these features out for yourself at this link, but be forewarned that this is a testing site that may experience performance delays and bugs.

The main page is still something you’d expect on any engine, a search box. It’s after the query however that things get interesting. On hover of each result returned you have the option to ‘Edit’, ‘Spotlight’, ‘Comment’ or ‘Delete’ the item. Lets run down these options:

Edit: As you’d expect, you click this and you can directly edit both the title of the result and the paragraph explanation that resides under it.

Spotlight: Allows you to highlight one result on a page, giving it a yellow background to make it stand out.

Comment: You can leave messages under every result to discuss that items/result. You can also leave comments about other comments.

Delete: You can remove any result you feel doesn’t fit the query.

All of these changes are saved and shown in the ‘Result History’ area on the site (which has it’s own RSS feed - nice). If you are not logged in, your IP address is the unique identifier to show who has changed what — just as with Wikipedia.

One of the main problems people had with the initial launch of Search Wikia is that the search results simply weren’t up to snuff. While they company is quick to note that that’s probably still the case in this testing phase, just how much results improve after users edit them will be a test of the entire concept.

Editing links is one thing, but users can also submit their own. Adding related searches is also as easy as clinking the link to do so and typing in a relevant word.

Mahalo is a people-powered search site that has been rising in popularity. Its results return static pages with multiple links on a topic. While anyone can submit a link to include on these pages, and Mahalo has been encouraging this and more with its newer social tools, the pages are still for the most part built by one person — a Mahalo employee. Mahalo also monitors each link submitted to make sure it is not spam. [Full disclosure: I have done some work for Mahalo]

Search Wikia is attempting to take a more community-centric approach — not surprising given Wikipedia’s nature. You have a page of search results just as you would see on Google, but anyone in the world can edit and manipulate those results on-the-fly.

The obvious concern here is spam, gaming and the simple inaccuracies of such a system. The same issues arise from time to time on Wikipedia, but a group of users committed to the cause always seem to sort these things out. The fact that anyone can just as easily delete an item as create one, and that all of this activity is recorded in logs, make this possible.

Search Wikia is still in its alpha testing phase, and as such things are still a bit rough around the edges. However, with this update we are finally getting a glimpse of Wales’ vision for the future of search. It is very promising. Test it out for yourselves.

wikiasearch41.jpgSearch Wikia search engine has received mixed reviews, to say the least, since launching this morning.

We’ve covered some of its business and social aspects and an initial look when it first launched, but have since had more time to tinker with it.

The harsh reviews resulted in Search Wikia putting up a temporary disclaimer at its home page, http://search.wikia.com, conceding that the quality of its search results is poor (screenshot below). This was not the case last night when the landing page took users straight to a search box.

To find the search box, you now go to http://alpha.search.wikia.com/.

landing.jpg

We decided to try a very popular term and a very obscure term. The goal of the popular term test is to figure out whether the engine finds the most “obvious” sources and returns them in a reasonable order. Conversely, the obscure term is to figure out whether the search engine can actually scale to the vast universe of queries individual users are likely to try — like they would at major search engines like Google. Finally, thirdly, we also tried a newsworthy query to gauge the freshness of the content returned.

Query: Las Vegas

We’ll assume Las Vegas is among the more popular travel-based queries in the United States. As an end user, I expected (personal expectations are key) to see a diversity of sources that would allow me to read about the city’s relevant attractions, show me travel and lodging options, recent and/or relevant reviews of user trips (blog posts) and news items related to Las Vegas. Here are Wikia’s results on the query (screenshot below). This results page is an example of what most new search engines must battle; it is not enough to have the best algorithm, best user interface or even the most comprehensive index. A search engine must deal with spam results, porn results and foreign language filtering. Not having these features available can completely kill relevancy of results.

vegas.jpg

Three out of the ten results below are foreign language results and completely useless to most U.S users. The second result redirects to relocation information about Las Vegas and the site itself is not a major travel site like Y! Travel or TripAdvisor. Surprisingly enough, the Wikipedia page about Las Vegas is not among the top ten even though Wales confirmed to us that Search Wikia is “deep-crawling” Wikipedia and therefore likely to rank Wikipedia very high among search results.

A straight comparison to Google is unfair, so we won’t include screenshots, but click here to find Google’s take on “Las Vegas”.

Query: Apocrine Hidrocystoma of the eyelid

This is an obscure query but one that users are actually enter into search engines. It is queries like these that expose a search engine’s comprehensiveness or lack thereof and test its ability to serve an essentially infinite universe of queries.

Here are Wikia’s results on the query (screenshot below).

obscure.jpg
This shows that Search Wikia’s index needs to grow substantially - and fast - if they want to achieve web scale. By comparison, we also checked out results from GoogleYahoo! Search. Note that both search engines find the same first result from eMedicine. Google finds 640 results while Yahoo! Search tops out at 152 results. By comparison, a popular term like Las Vegas generates hundreds of thousands of results.

Query: Barack Obama

This is an interesting query because not only is it extremely popular but also very newsworthy. As an end user, I expect to see Obama’s “official” website, some popular news stories about him, references to the books he has written or even his MySpace page. Most importantly, I would want to see some mention of his recent surprise victory in Iowa (again, personal expections are absolutely critical).

Here are Wikia’s results on the query (screenshot below).

obama10.jpg

The results are better than the first two queries we tried. The first result is a blog site. This site is clearly not the most “obvious” site about Barack Obama as an end user. Web Analytics firm Compete.com reports that the site receives less than 1000 visits a month, making it insignificant from a traffic point of view. Google also reports that the number of sites linking to the first result is only 60 (links to a site may also be roughly thought of as a measure of popularity of the site).

By contrast, click here to find Google’s take on “Barack Obama”. The first result (http://www.barackobama.com, the official site) has 6230 other pages linking to it. Also check out Yahoo!’s take on “Barack Obama”.

Conclusion

We applaud Search Wikia’s attempt to revolutionize the search space by creating an open source, freely licensable search engine with room for humans to contribute in the search results. In our review, we found all the problems we expected to find at such a nascent stage - spam, pornography, foreign language results and lack of coverage are all problems that Search Wikia needs to take on aggressively to build a killer product.

Saumil Mehta is a contributing writer for VentureBeat. Disclosure: He is product manager at Kosmix RightHealth (http://www.righthealth.com), which is also a search engine company. The opinions expressed here are his own.

[Update: See our subsequent post here, which includes a deeper look, after Search Wikia gets slammed by other initial reviews]

Search Wikia, a new search engine site, has launched publicly after two weeks of private testing.

wales.jpgThe search engine has been highly anticipated for its unique, open-source approach to search as well as its high profile founder, Jimmy Wales (pictured here), who has led online encyclopedia coverage).

We spoke with Wales under embargo last week about his plans.

This time, Search Wikia is a for-profit endeavor, part of Wikia, another company Wales co-founded. And it’s going after the biggest search engine of them all, Google.

According to Wales, Search Wikia’s primary innovation will be to tie a user’s social network - that is, information about the user and their friends - into search results. The idea is that a user and their friends share a common set of preferences and that using that information makes search results more personalized as well as more relevant. More on that in a second.

Here’s how it works: Users will see a familiar search interface (see screenshot below; I tried a search for “Warner Blu Ray,” looking for what Search Wikia has for results about Warner’s defection into the Blu Ray camp).

The twist, however, is that they can begin to shape the results without even registering for the service. First, users can influence search results by editing a Wiki section that appears at the top of each results page (see left red arrow below). This section is likely to be very similar to Google “OneBox” or Ask’s “Smart Answers” - a specially marked section at the top of the page that answers the user’s query without necessarily showing search results.

screenshot2.jpg

While Google’s Onebox is created algorithmically based upon the query (screenshot below), Search Wikia’s wiki results will rely upon its users - you and me - to edit the section. Since all search engines allocate a great premium to the first result, Search Wikia is effectively allowing its users to collectively control the most important result on the page. Search Wikia will use unregistered users’ IP addresses as a track record of such edits to prevent spammers from manipulating the section for irrelevant or malicious content.

google-warner.jpg

Similar to Digg, Search Wikia will also allow all users (including unregistered users) to vote upon individual results using 1-5 stars, and to flag results that the user finds inappropriate or irrelevant to the query. Wales said that in the near term, the service will not yet use the voting to influence actual results. The current plan is to simply view user behavior first, collect data about that behavior and find the most appropriate way to feed these votes back into the search engine’s ranking algorithms, he explained. Search Wikia certainly won’t be the first to try this since Google Labs recently ran an almost identical experiment for users with a Google account (screenshot below).

Digg has had great success with its model of allowing users to propel the best stories to the front page, but it has also had to deal with spammers creating dummy accounts to artificially boost a story’s rank. Search Wikia plans to solve this problem by recording all clicks as public acts and using IP addresses or usernames to create a trail of actions. Wales also referred to Search Wikia’s reliance on a hybrid of algorithms and human intervention - as opposed to Digg’s complete reliance on user votes - as a way to eliminate the problem.

In addition to a search engine, Search Wikia will launch — also Monday — a full featured social network. Users that sign up for the social network (again, by registering) will have a profile page and the ability to befriend and message other users. At some point, we assume that a particular user’s search results will be influenced by the votes of the user’s friends within the Search Wikia social network. This approach to search is referred to as “social search” and other companies have tried their hand at it with limited success. Eurekster launched a search service in 2004 that ranks and re-orders search results based upon a user’s friends’ clicks. Today, Eurekster does not have a destination search site; instead, it builds social search into popular blogs and small websites. We tried a newsworthy technology query on TechCrunch’s site, which uses Eurekster search, and found that the results didn’t reflect our intended search — we wanted news about Warner’s defection into the Blu-ray camp (screenshot below). Yahoo! My Web, which launched in 2005, is also an attempt at social search - it allows users to bookmark particular results as well as invite their friends to be able to use their bookmarks for relevant results. While the service is still around, it has not been integrated into mainstream Yahoo! Search.

Wales has also held talks with Google Developer Advocate Kevin Marks, one of the evangelists behind Google’s OpenSocial initiative. Search Wikia will support OpenSocial, allowing developers that build social applications for Google’s Orkut or LinkedIn to easily run same applications within the Search Wikia social network. Since some of the OpenSocial APIs are as yet , look for this part of the service to evolve over time.

techcrunch.jpg

Many new search engines also license search results from the major engines when they have no results to show for a particular query. Here is an example of Mahalo showing Yahoo and Google results because there isn’t a Mahalo page for the term. However, Wales confirmed that Search Wikia is not currently involved in discussions with third party search engines. He also confirmed that the service is currently performing a deep crawl of Wikipedia and that results from Wikipedia were likely to rank high in the beginning. Given these details and an index size between 50 and 100 million pages - compared to a Google index that is rumored to be well north to 40 billion pages, it is safe to say that it will be quite some time before Search Wikia can truly be a general purpose search engine. In the meanwhile, Search Wikia hopes that its approach to crawling the web - by using volunteers that download a desktop client software called Grub - will allow it to build a comprehensive index in a relatively short period of time.

[Photo of Jimmy Wales by Andrew Lih]

Saumil Mehta is a contributing writer for VentureBeat. Disclosure: He is product manager at Kosmix RightHealth (http://www.righthealth.com), which is also a search engine company. The opinions expressed here are his own.

wikiasearch.jpgSearch Wikia, the highly anticipated search engine by Wikia, the for-profit company of Wikipedia co-founder Jimmy Wales, will launch publicly on Monday. It is currently in private testing mode, and we’ll write more upon launch.

The huge success of Wikipedia in mobilizing humans makes this project particularly notable. It’s a fascinating alternative to Google’s computer-focused approach.

We’ve tested Grub, the service’s way of crawling the Internet’s web sites to collect data. Grub is a “distributed search crawler,” so named because it lets people download a software to do the crawling from their own computers, thereby letting thousands of people contribute to the process. It is intuitive and easy to use. However, large questions remain about the ability of Search Wikia’s approach to scale to the entire Web.

Wikia, the parent company, already has a live service independent of its Search Wikia’s efforts. The current site hosts free wikis — areas of the site open for collaborative editing — for communities in an ad-supported model. The resulting topics covered are usually deeper in detail than the average Wikipedia article. Wikia’s wikis use the same software that powers Wikipedia.

Search Wikia will borrow the ideas and principles that have made Wikipedia so successful–strong community emphasis, transparency, freedom to contribute and free licensing.

Search Wikia’s lofty aspirations of transparency raise some very important questions about the ability of spammers to manipulate search results. All the major search engines guard their ranking algorithms closely in order to prevent such manipulation. It’s clear that Search Wikia will rely on the same kind of community monitoring and self-policing that have made the fully open Wikipedia increasingly popular in spite of the same threats. According to a story by New Scientist, Jeremie Miller, the search project’s technology head honcho, the search service will integrate wiki-like tools to improve search. The ability to vote on search results is an example of such social tools.

Search Wikia will also rely on a cadre of volunteers to help it crawl the web with the Grub distributed web crawler. The Grub client is a consumer desktop application that harnesses spare CPU cycles on volunteers’ machines and crawls a small portion of the Web. The New Scientist article informs us that the January 7 launch product will have an index of approximately 100 million pages. Given the size and scale of the Web, this is a relatively unimpressive number and quite possibly not big enough to cover one vertical (say, Sports or Health), much less the horizontal universe of queries that any general search engine must be prepared to handle. That being said, widespread usage of Grub by hundreds of thousands of volunteers and an index that actually scales to the Web would be a disruptive development and a new way to think about search.

We tested the Grub crawler client (screenshots below) on a dual core Lenovo ThinkPad T60 laptop running Windows XP. The download and install process was a snap even though the Windows client is running in TEST mode and is expected to be buggy. We ran Grub using a Comcast cable connection for an hour and found that it crawled pages alphabetically by domain name. We also found that the Grub client accessed previously crawled pages to ensure freshness of content and updated the page only when required. We don’t know yet how the Grub system decides which URLs to crawl. It would also be interesting to see published estimates from Search Wikia on how many client installations it takes on average to build a crawl of, say, 30 billion URLs.

We also anticipate that Search Wikia will also rely on the same type of developer community that created world-class open-source projects like Mozilla Firefox and Linux. An April 2007 article in Fast Company says developers have been enthusiastic about being able to tweak complex search algorithms in an open-source environment. It’s easy to imagine a lot of talented developers wanting to try their hand at a problem that’s technically challenging on several fronts (see Anna Patterson’s article: Why Writing Your Own Search Engine Is Hard)

However, this is where the comparisons to Wikipedia become less believable. Wikipedia’s model of allowing anyone to edit pages around particular topics has been successful in part because everyone considers themselves expert enough to contribute cogently on a few topics. The same cannot be said of search technology. It’s unclear whether this community can deliver to the requirements of a web search engine.

Finally, there’s the question of organic traffic to the service. Wikipedia sites constitute the eighth largest set of properties on the Web according to Internet analytics firm ComScore. Wikipedia is a certified Internet brand, but as of December 2006, Google accounted for 50 percent of its incoming traffic, barring certain caveats (see Rick Skrenta’s post for more). It sounds unlikely that Google will send the same volumes of traffic to a competing search service. This also means that Wikia must face the unappetizing task of getting users to switch ingrained search behavior and start their Web surfing at a site other than Google.

Note also that another recent company that started life as a human-powered search engine–Mahalo — seems to be relying on Google SEO for distribution and traffic. As of December 29, 2007, Google has indexed 79,600 pages from the domain Mahalo.com, quite possibly as acknowledgment of the difficulties of driving organic traffic to a competing search engine. Over the next few days, we also plan to investigate questions around the company’s business model, its organizational structure for community developers, the set of social features it will launch with, and when it expects to scale to be able to serve a large portion of queries well.

wikisearch2.jpg

wikisearch3.jpg

Here’s the latest action:

1. Search Wikia could go live by Christmas, to take on Google
2. Nick Denton, the nemesis of Silicon Valley, makes himself editor of Gawker
3. DocStoc, fresh with cash, gives away $50 Amazon gift certificate daily to user uploading best quality docs
4. AdReady, the Seattle advertising startup offering online banner ads, raises $10M more
5. Why Google is going after Wikipedia
6. EyeQ, tracks contributions of developers to projects
7. Ideeli, a invite-only Web retailer for high-end goods, gets $3.8M

wikiasearch.jpg

Search Wikia could go live by Christmas — Just as Google starts to experiment more with social search, the open-source search engine called Search Wikia, backed by Wikipedia founder Jimmy Wales, could go live as early as this week, according to New Scientist. Unlike Google, Search Wikia will not share search data with advertisers, nor encroach on privacy by storing users’ search terms. A reported 500 volunteers are running web-crawlers to compile Search Wikia’s web index, which so far totals 100 million pages, according to the report. (See our earlier coverage here).

Nick Denton, the nemesis of Silicon Valley, makes himself editor of Gawker — His Valleywag blog has harassed the high-and-mighty in Silicon Valley with constant gossip. But Vallewag is just of many blogs in Denton’s empire, and now he wants to take the company to the next level.

DocStoc, fresh with cash, gives away $50 Amazon gift certificate daily to user uploading best quality docs — DocStoc, the new company which faces fierce competition with Scribd to become a destination for online documents, has finished raising its seed round of $750,000. The latest investor is Matt Coffin, of LowerMyBills.com. So now DocStoc can afford to give $50 certificates for users who upload the best quality content to the site each day.

AdReady, the Seattle advertising startup offering online banner ads, raises $10M more — The company builds ads for various categories of sites, from real estate to travel. Advertisers in these sectors get to customize these ads, and then insert them into large ad networks where they will be displayed on relevant Web sites. Bain Capital and Khosla Ventures led the investment, together with existing investor Madrona Venture Group. The company was founded last year by former Classmates.com executives. The company, which launched in October, says it has about 1,000 customers. It takes a 20 percent commission on the campaigns. (More in-depth coverage by John Cook)

knol-wikipedia.jpgWhy Google is going after Wikipedia — Here’s why Google is going after Wikipedia, by creating Knol, a feature that lets people contribute entries about objects and things. Hitwise shows the phenomenal growth of Wikipedia, which is dependent on Google for just more than half of its traffic.

EyeQ, tracks contributions of developers to projectsSourceKibitzer, an Estonian company based out of St. Petersburg, Russia, has launched EyeQ — a tool for analyzing the contribution of developers to individual projects. Through proprietary algorithms this software computes each developer’s “know how,” contribution, and the complexity of every statement written. Mark Kofman, co-founder, saw the need for “multi-location software development team” analysis in his own experience as a developer for an outsourcing provider. The other co-founder, Anton Litvinenko, developed the basis for the algorithms EyeQ uses in his studies at the University of Tartu, located in Estonia. SourceKibitzer was angel funded and will be looking for venture capital in order to expand.

Ideeli, a invite-only Web retailer for high-end goods, gets $3.8 million — This new service has the feel of gimmickry too it, though its exclusivity might appeal to high-end consumers. Ideeli features luxury products typically valued at between $200 and $700, and it informs its members when products are about to go on sale the coming week. The New York company notifies its members both on the Web and with SMS alerts. Members, so far about 10,000 of them, get to see the sales prices, but only those members who pay a $7.95 monthly subscription see the sales early — and so have an advantage in the mad dash to buy while quantities last. The company launched in May. The funding comes from rs from Kodiak Venture Partners and angel investors. Oscar De La Renta and Baccarat have signed up as partners.

Top Stories

Recent Comments

Powered by Disqus

Featured Guest Columnists

Job Board

Links

Venturebeat Writers

  • For advertising, contact .
  • Log in

Font Size