Security

CloudFlare goes down for an hour, taking its 785K customers with it

In an ironic twist this morning, CloudFlare, a company that speeds up and protects websites, suffered an outage that also took down the 785,000 sites using its service, including Wikileaks and 4Chan.

A change pushed out to the company’s routers ended up crashing them, TechCrunch reports. Chief executive Matthew Prince (above) told the site, “If you sent a packet to one of our IP addresses, you would get back a response that there was no router.” The outage lasted for almost an hour.

CloudFlare serves as a line of defense between its customers and web visitors, which allows it to cache sites for better page loading performance, and also makes it difficult to take down sites with distributed denial of service (DDoS) attacks. But that also means if CloudFlare goes down, so does its customers.

The company runs 23 data centers globally, all of which were affected by the outage. “These data centers are connected to the rest of the Internet using routers,” Prince explained in a blog post this morning. “These routers announce the path that, from any point on the Internet, packets should use to reach our network. When a router goes down, the routes to the network that sits behind the router are withdrawn from the rest of the Internet.”

CloudFlare’s troubles began when it noticed that one of its customers was being targeted for a DDoS attack. A CloudFlare team member ended up creating a rule for its Juniper routers to block the attacker’s unusually large packets. But instead of simply accepting the rule, the routers ended up consuming all of their RAM when they encountered it.

Prince detailed how the company responded to the widespread outage in his blog post:

CloudFlare’s ops and network teams were aware of the incident immediately because of both internal and external monitors we run on our network. While it wasn’t initially clear the reason the routers had crashed, it was clear that it was an issue caused by an inability for packets to find a route to our network. We were able to access some routers and see that they were crashing when they encountered this bad rule. We removed the rule and then called the network operations teams in the data centers where our routers were unresponsive to ask them to physically access the routers and perform a hard reboot.

“This is a completely unacceptable event to us,” Prince told TechCrunch. “In our four years of life, this is our third significant outage,” he added.

San Francisco-based CloudFlare has raised around $22 million in funding from New Enterprise Associates, Pelion Venture Partners, and Venrock.

Update: Juniper issued the following statement:

Juniper Networks is aware of and investigating a reported network outage with one of our customers, Cloudflare. While we have not completed our investigation, we believe this incident was triggered by a product issue that Juniper identified last October, when a patch was also made available. Our customer support team is actively supporting Cloudflare in its efforts to resolve the issue and we are not aware of any other customers experiencing similar issues.