Facebook is updating its policies to explicitly allow a handful of third-party search engines to crawl public content.

Before, Facebook banned robots, spiders, scrapers or harvesting bots from automatically collecting data across the social network’s pages, unless their creators had written permission. This raised the criticism that the social network was trying to have it both ways — it could juice up search engine optimization and be discovered on Google, and crack down on emerging threats from smaller companies that might use the data in innovative ways.

The company’s chief technology officer Bret Taylor countered that criticism on Hackers News today, saying that Facebook’s policies were meant to protect users from “sleazy” crawlers that might grab their data and resell it.

He added that Facebook hadn’t meant to muck up long-standing norms on the web. The company is changing its robots.txt file, a special document that asks search engine robots to ignore certain directories, so that it includes approved partners like Google, Baidu and Bing.

Taylor said:

“It was bad for us to stray from Internet standards and conventions by having an robots.txt that was open and a separate agreement with additional restrictions. This was just a lapse of judgment.

We are updating our robots.txt to explicitly allow the crawlers of search engines that we currently allow to index Facebook content and disallow all other crawlers. We will whitelist crawlers when legitimate companies contact us who want to crawl us (presumably search engines). For other purposes, we really want people using our API because it has explicit controls around privacy and has important additional requirements that we feel are important when a company is using users’ data from Facebook (e.g., we require that you have a privacy policy and offer users the ability to delete their data from your service).”

Still, the original policy stands. If an external search engine wants to crawl the social network, it has to contact Facebook to get on the white list.