<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Wikia to launch new social search engine, more on Monday</title>
	<atom:link href="http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/feed/" rel="self" type="application/rss+xml" />
	<link>http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/</link>
	<description>News About Tech, Money and Innovation</description>
	<lastBuildDate>Wed, 25 Nov 2009 20:47:38 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.5</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Overheard: Why do we need Search Wikia? - Overheard in the tech blogosphere</title>
		<link>http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/comment-page-1/#comment-721156</link>
		<dc:creator>Overheard: Why do we need Search Wikia? - Overheard in the tech blogosphere</dc:creator>
		<pubDate>Tue, 08 Jan 2008 17:23:42 +0000</pubDate>
		<guid isPermaLink="false">http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/#comment-721156</guid>
		<description>[...] wrote earlier: We’ve tested Grub, the service’s way of crawling the Internet’s web sites to collect data. [...]</description>
		<content:encoded><![CDATA[<p>[...] wrote earlier: We’ve tested Grub, the service’s way of crawling the Internet’s web sites to collect data. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: VentureBeat &#187; Search Wikia gets slammed, here&#8217;s our review</title>
		<link>http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/comment-page-1/#comment-720330</link>
		<dc:creator>VentureBeat &#187; Search Wikia gets slammed, here&#8217;s our review</dc:creator>
		<pubDate>Tue, 08 Jan 2008 04:06:51 +0000</pubDate>
		<guid isPermaLink="false">http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/#comment-720330</guid>
		<description>[...] covered some of its business and social aspects and an initial look when it first launched, but have since had more time to [...]</description>
		<content:encoded><![CDATA[<p>[...] covered some of its business and social aspects and an initial look when it first launched, but have since had more time to [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: VentureBeat &#187; Search Wikia gets slammed, here&#8217;s our review</title>
		<link>http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/comment-page-1/#comment-720344</link>
		<dc:creator>VentureBeat &#187; Search Wikia gets slammed, here&#8217;s our review</dc:creator>
		<pubDate>Tue, 08 Jan 2008 04:06:51 +0000</pubDate>
		<guid isPermaLink="false">http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/#comment-720344</guid>
		<description>[...] covered some of its business and social aspects and an initial look when it first launched, but have since had more time to [...]</description>
		<content:encoded><![CDATA[<p>[...] covered some of its business and social aspects and an initial look when it first launched, but have since had more time to [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Search Engine Optimization Direct &#187; Blog Archive &#187; Wikia to launch new search engine January 7</title>
		<link>http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/comment-page-1/#comment-719084</link>
		<dc:creator>Search Engine Optimization Direct &#187; Blog Archive &#187; Wikia to launch new search engine January 7</dc:creator>
		<pubDate>Mon, 07 Jan 2008 06:58:16 +0000</pubDate>
		<guid isPermaLink="false">http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/#comment-719084</guid>
		<description>[...] Per and Susanne Koch article is brought to you using rss feeds.Here are some of the top articles on search engine optimization.Search Wikia, the highly anticipated search engine by Wikia Inc., Jimmy Wales’ (Wikipedia’s co-founder) for-profit company, will launch publicly on January 7 and is currently in private testing mode. We tested the service’s distributed &#8230; [...]</description>
		<content:encoded><![CDATA[<p>[...] Per and Susanne Koch article is brought to you using rss feeds.Here are some of the top articles on search engine optimization.Search Wikia, the highly anticipated search engine by Wikia Inc., Jimmy Wales’ (Wikipedia’s co-founder) for-profit company, will launch publicly on January 7 and is currently in private testing mode. We tested the service’s distributed &#8230; [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: VentureBeat &#187; Search Wikia launches: Will it threaten Google?</title>
		<link>http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/comment-page-1/#comment-718972</link>
		<dc:creator>VentureBeat &#187; Search Wikia launches: Will it threaten Google?</dc:creator>
		<pubDate>Mon, 07 Jan 2008 05:03:43 +0000</pubDate>
		<guid isPermaLink="false">http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/#comment-718972</guid>
		<description>[...] Jimmy Wales, who has led online encyclopedia Wikipedia to considerable success (see our previous coverage). We spoke with Wales (pictured here) under embargo last week about his [...]</description>
		<content:encoded><![CDATA[<p>[...] Jimmy Wales, who has led online encyclopedia Wikipedia to considerable success (see our previous coverage). We spoke with Wales (pictured here) under embargo last week about his [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Saumil Mehta</title>
		<link>http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/comment-page-1/#comment-718468</link>
		<dc:creator>Saumil Mehta</dc:creator>
		<pubDate>Sun, 06 Jan 2008 18:35:08 +0000</pubDate>
		<guid isPermaLink="false">http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/#comment-718468</guid>
		<description>Chris,

The need to index the entire web stems not from number of results, but from the ability to serve an essentially infinite number of queries. For example, you can index a 100 million pages and just serve queries related to Entertainment or Sports. But then you can&#039;t compete with Google anymore!</description>
		<content:encoded><![CDATA[<p>Chris,</p>
<p>The need to index the entire web stems not from number of results, but from the ability to serve an essentially infinite number of queries. For example, you can index a 100 million pages and just serve queries related to Entertainment or Sports. But then you can&#8217;t compete with Google anymore!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris</title>
		<link>http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/comment-page-1/#comment-718440</link>
		<dc:creator>Chris</dc:creator>
		<pubDate>Sun, 06 Jan 2008 17:29:55 +0000</pubDate>
		<guid isPermaLink="false">http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/#comment-718440</guid>
		<description>How many people search beyond the first 2-5 pages? What is this need to index everything?</description>
		<content:encoded><![CDATA[<p>How many people search beyond the first 2-5 pages? What is this need to index everything?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Saumil Mehta</title>
		<link>http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/comment-page-1/#comment-718062</link>
		<dc:creator>Saumil Mehta</dc:creator>
		<pubDate>Sun, 06 Jan 2008 00:40:59 +0000</pubDate>
		<guid isPermaLink="false">http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/#comment-718062</guid>
		<description>Well, I do understand how Grub fits into Wikia from a high level standpoint. They need to get a substantial crawl going to build a real search engine and they don&#039;t have Microsoft&#039;s and Yahoo&#039;s millions to start from scratch. 

As to your other question, the grub website is fairly light on docs about how they actually do the crawl and doesn&#039;t have the source code posted for people to tinker around. I expect that to change relatively soon.</description>
		<content:encoded><![CDATA[<p>Well, I do understand how Grub fits into Wikia from a high level standpoint. They need to get a substantial crawl going to build a real search engine and they don&#8217;t have Microsoft&#8217;s and Yahoo&#8217;s millions to start from scratch. </p>
<p>As to your other question, the grub website is fairly light on docs about how they actually do the crawl and doesn&#8217;t have the source code posted for people to tinker around. I expect that to change relatively soon.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: TS</title>
		<link>http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/comment-page-1/#comment-718034</link>
		<dc:creator>TS</dc:creator>
		<pubDate>Sun, 06 Jan 2008 00:06:08 +0000</pubDate>
		<guid isPermaLink="false">http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/#comment-718034</guid>
		<description>Actually, access to local I/O is usually NOT the main bottleneck in crawling. Assuming of course that one does not write out each retrieved page in a separate random write, and this depends on the crawler architecture that is used. Of course, once you crawl tens of thousands of URLs per second, almost everything becomes a bottleneck, but there are only a few dozen players that have a need for that kind of speed.

The main bottlenecks in large-scale crawling (in this order) are probably crawl management (i.e., the human/software complexity side of managing and scaling a crawl to millions of hosts) and the bandwidth. The CPU power is not a major  issue. Grub basically harvests bandwidth from clients. 

But I would be concerned about the crawl management part of the grub approach - are they using a fairly brute-force approach to recrawling that wastes (other people&#039;s) bandwidth, as opposed to the smarter recrawling strategies used by the major engines? How do they deal, e.g., with requests by sites to immediately cease crawling a domain (due to possible or perceived misbehavior of the crawler or local problems at the site)? And it is not clear how grub is really fitting into the whole wikia approach.</description>
		<content:encoded><![CDATA[<p>Actually, access to local I/O is usually NOT the main bottleneck in crawling. Assuming of course that one does not write out each retrieved page in a separate random write, and this depends on the crawler architecture that is used. Of course, once you crawl tens of thousands of URLs per second, almost everything becomes a bottleneck, but there are only a few dozen players that have a need for that kind of speed.</p>
<p>The main bottlenecks in large-scale crawling (in this order) are probably crawl management (i.e., the human/software complexity side of managing and scaling a crawl to millions of hosts) and the bandwidth. The CPU power is not a major  issue. Grub basically harvests bandwidth from clients. </p>
<p>But I would be concerned about the crawl management part of the grub approach &#8211; are they using a fairly brute-force approach to recrawling that wastes (other people&#8217;s) bandwidth, as opposed to the smarter recrawling strategies used by the major engines? How do they deal, e.g., with requests by sites to immediately cease crawling a domain (due to possible or perceived misbehavior of the crawler or local problems at the site)? And it is not clear how grub is really fitting into the whole wikia approach.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Saumil Mehta</title>
		<link>http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/comment-page-1/#comment-717250</link>
		<dc:creator>Saumil Mehta</dc:creator>
		<pubDate>Sat, 05 Jan 2008 03:00:54 +0000</pubDate>
		<guid isPermaLink="false">http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/#comment-717250</guid>
		<description>Actually, we should clarify &quot;distributed search crawler&quot; some more. I would assume that anyone that wants to crawl splits the crawl across n machines. I wasn&#039;t aware of your point around local IO being a bottleneck. 

I am unable to find the Grub source code or a whole lot of technical literature on the system but that is going to be corrected very soon, I&#039;m told.</description>
		<content:encoded><![CDATA[<p>Actually, we should clarify &#8220;distributed search crawler&#8221; some more. I would assume that anyone that wants to crawl splits the crawl across n machines. I wasn&#8217;t aware of your point around local IO being a bottleneck. </p>
<p>I am unable to find the Grub source code or a whole lot of technical literature on the system but that is going to be corrected very soon, I&#8217;m told.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kevin Burton</title>
		<link>http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/comment-page-1/#comment-717194</link>
		<dc:creator>Kevin Burton</dc:creator>
		<pubDate>Sat, 05 Jan 2008 02:03:50 +0000</pubDate>
		<guid isPermaLink="false">http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/#comment-717194</guid>
		<description>The biggest issue with building a crawler is not the bandwidth it&#039;s scaling and having fast access to local IO.

With Spinn3r:

http://spinn3r.com

We have a distributed crawler but we run it within our own cluster because having 10k clients wouldn&#039;t really buy us anything.

Of course maybe from Wikia&#039;s perspective this is just a blind HTTP fetch task and they then aggregate it locally within their cluster.

The comparison to Wikipedia might fall down here.  Wikipedia has about 1.5M english pages.  The net has billions.  

You can&#039;t just rely on humans for this stuff.

Kevin</description>
		<content:encoded><![CDATA[<p>The biggest issue with building a crawler is not the bandwidth it&#8217;s scaling and having fast access to local IO.</p>
<p>With Spinn3r:</p>
<p><a href="http://spinn3r.com" rel="nofollow">http://spinn3r.com</a></p>
<p>We have a distributed crawler but we run it within our own cluster because having 10k clients wouldn&#8217;t really buy us anything.</p>
<p>Of course maybe from Wikia&#8217;s perspective this is just a blind HTTP fetch task and they then aggregate it locally within their cluster.</p>
<p>The comparison to Wikipedia might fall down here.  Wikipedia has about 1.5M english pages.  The net has billions.  </p>
<p>You can&#8217;t just rely on humans for this stuff.</p>
<p>Kevin</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Wikia on VentureBeat &#171; Life in the Bit Bubble</title>
		<link>http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/comment-page-1/#comment-716956</link>
		<dc:creator>Wikia on VentureBeat &#171; Life in the Bit Bubble</dc:creator>
		<pubDate>Fri, 04 Jan 2008 22:41:56 +0000</pubDate>
		<guid isPermaLink="false">http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/#comment-716956</guid>
		<description>[...] Here it is if you&#8217;d like to follow the link and read it: http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/ [...]</description>
		<content:encoded><![CDATA[<p>[...] Here it is if you&#8217;d like to follow the link and read it: <a href="http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/" rel="nofollow">http://venturebeat.com/2008/01/04/wikia-to-launch-new-search-engine-jan-7/</a> [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>
