OffBeat

How a CloudFlare network engineer fixed a Google outage last night

Yesterday Google went down for about 30 minutes … until it was fixed by a network engineer who doesn’t even work for Google.

Tom Paseka works for CloudFlare, the content delivery network that handles more traffic than Amazon, Wikipedia, Twitter, Instagram, and Apple combined, delivering more than two billion pageviews per employee. The company knows a few things about the Internet.

What Paseka knew last night, apparently before any Google employees noticed, was that Google’s services appeared to be offline. Tracing the problem, he noticed an Indonesian Internet service provider in the path to Google — odd by any standard.

Particularly when CloudFlare is just a few miles from Google, not an ocean away.

It turns out, Paseka learned, that the Indonesian ISP Moratel was giving its users an incorrect route to Google. And because Moratel was trusted by other networks upstream, the incorrect route was propagating around the globe. As Paseka writes:

And, quickly, the bad routes spread. It is unlikely this was malicious, but rather a misconfiguration or an error evidencing some of the failings in the BGP (Border Gateway Protocol) Trust model.

The fix was simply notifying Moratel about the issue, which Paseka did. Three minutes later, the problem was fixed and Google’s services were back online. Of course … they had never gone down. But they had been inaccessible.

You may not have noticed unless you were in Hong Kong. Paseka estimated that the entire outage affected only about 3-5 percent of the Internet population.

No word on whether Google engineers sent their CloudFlare colleagues a box of donuts or a Google hoodie in thanks.

photo credit: GustavoG via photopin cc

0 comments