DEMO: 80legs promises web crawling for the masses

In tech, it’s trendy to offer something “as a service” — to deliver it online, usually for a subscription fee. We’ve seen software-as-a-service, platform-as-a-service, data-as-a-service, and more. Now a startup called 80legs says it’s selling web crawling as a service.

Web crawling means browsing and indexing online content in an automated fashion. Its most prominent usage is in creating the databases for search engines like Google, but it’s important for anyone who wants to find content on the web, such as a movie studio that wants to find pirated footage, or an ad network that wants to see where its ads are being placed. For now, the main options are to build your own web crawler, usually using your own data center, or to take advantage of online services like Amazon Elastic MapReduce, says 80legs chief executive Shion Deysarkar.

80-legscomparisontableIt’s much easier to use 80legs (which is launching at DEMOfall 09 today, the technology conference co-produced by VentureBeat) — you just make your choices from several menus, telling 80legs where you want it to crawl and what you want it to look for, and it returns a data file with your results. Innovations like using Plura‘s distributed computing technology make 80legs more powerful, too. Deysarkar says that where Amazon can crawl 100 million pages per day, 80legs can crawl 2 billion. (I’ve included a comparison chart created by 80legs.) All of this makes web crawling accessible to smaller companies. Deysarkar compares the service to having a “mini-Google” at your disposal.

80legs is also opening an application store, where developers sell apps that further refine the web crawling results. For example, Deysarkar says, developers could sell apps that perform sentiment analysis, look for video fingerprints, or analyze sentence structure.

“Hopefully we can become sort of the de facto web crawling provider,” Deysarkar says. “No one’s thought about web crawling as a market. We want to control that market.”

80legs is charging $2/million pages crawled and $0.03 per CPU-hour used. It’s based in Houston. The company has raised $400,000 from Creeris Ventures.

80legs-job-form