Facebook is updating its policies to explicitly allow a handful of third-party search engines to crawl public content.
Before, Facebook banned robots, spiders, scrapers or harvesting bots from automatically collecting data across the social network’s pages, unless their creators had written permission. This raised the criticism that the social network was trying to have it both ways — it could juice up search engine optimization and be discovered on Google, and crack down on emerging threats from smaller companies that might use the data in innovative ways.
The company’s chief technology officer Bret Taylor countered that criticism on Hackers News today, saying that Facebook’s policies were meant to protect users from “sleazy” crawlers that might grab their data and resell it.
He added that Facebook hadn’t meant to muck up long-standing norms on the web. The company is changing its robots.txt file, a special document that asks search engine robots to ignore certain directories, so that it includes approved partners like Google, Baidu and Bing.
“It was bad for us to stray from Internet standards and conventions by having an robots.txt that was open and a separate agreement with additional restrictions. This was just a lapse of judgment.
Still, the original policy stands. If an external search engine wants to crawl the social network, it has to contact Facebook to get on the white list.