Just as email spam has cemented itself as an unstoppable scourge in our daily digital existence, content scraping bots are polluting the web by slowing sites down, stealing content, and generally running amok. And until now, there hasn’t been much we could do about it.
Arlington, Virginia-based Distil believes it has the answer to bad bots: It has developed a Content Protection Network that detects bots and prevents web scraping. The company has been so successful that it recently surpassed 1 billion blocked bots since it started counting around seven months ago.
Bots affect pretty much anyone who has a website online. They scrape content from digital publishers like VentureBeat and re-purpose it on their own ad-filled sites. Bots can also be used to gather intelligence around your business, which can give your competitors an advantage. Bots also increase the latency and server load for your website, which leads to a worse experience for legitimate visitors (and higher bills on your end).
Distil doesn’t just block IP addresses, which is typically the first course of action for online security companies. It’s secret sauce is in how it identifies bots.
Rami Essaid, Distil’s founder and chief executive, points out that IP addresses change frequently, so they’re not the best way to stop malicious people. Instead, the company uses a variety of approaches to discover bots, including behavioral analytics, session rate limiting, and a unique method of fingerprinting visitors.
“We look at the combination of your unique browser running on your OS, the settings of the browser and settings of your OS, and we combine that into something that’s unique about you,” Essaid explained in an interview with VentureBeat. “This is trackable on the server side — in the age where Do Not Track cookies are starting to be a bigger and bigger deal, something you can track server side gives us control over who we track and how. … We’ve got 2-3 years before people start saying we’re invading users’ privacy.”
Distil officially launched in April 2012, but the company didn’t actually start counting the bots it was blocking until several months later, Essaid tells me. Over the past year, he’s noticed some interesting trends thanks to the company’s unique view of the web: Search engines actually send more traffic to sites they index more (something that could be useful for publishers to note), around 30 percent of a website’s traffic comes from non-search engine bots (which includes both malicious and benign bots), and bot chatter typically takes up around 20 percent of traffic coming from residential ISPs like Comcast and Time Warner.
“It may reach a point where we become a target from bot makers,” Essaid said. He’s noticed job listings for known bot makers who ask for applicants with an ability to write software against “something like Distil Networks.”
Even though mobile devices are on the rise, the company hasn’t yet noticed any significant bot activity from them. Essaid notes that there’s the potential for interesting malicious attacks from smartphones down the line, though, since they’re typically always on and connected to a data source (and they’re getting faster every few months).
Other tidbits from the company’s first year in business:
- It has identified 114,410,520 different IPs used by bots
- The most common browsers used in attacks were Internet Explorer 6 (5.42 percent) and Firefox 3 (1.9 percent) — both notably very old versions
- 75.3 percent of all bots are just using scripts, not real browsers
- Java is the most common bot scripting language (4.71 percent), while Wget accounts for 3.65 percent
- 3 percent of all malicious bots pretend to be huge search engines like Google or Bing
Distil is a recent TechStars alum and has raised around $2 million so far from ff Venture Capital, IDEA Fund Partners, Cloud Power Fund, and others.
As for what’s next in the bot world, Essaid expects a major shift from big network players who’ve ignored the threat too long:
“The battle to fight off malicious attacks is going too slowly … Denial of service attacks [typically caused by bots] are becoming more and more prevalent, they’re jeopardizing big networks,” he said. “We predict there’s going to be a shift to ISPs being more involved in defense to stop things closer to the edge of where it’s originating, as opposed to the edge of where the destination is. … In the next 3 to 5 years, you’re going to see Cisco or Juniper come out with a way to tie all the ISPs together in a sort of threat coordination system.”
VB's research team is studying web-personalization... Chime in here, and we’ll share the results.