Dev

Exclusive: How Facebook is open-sourcing its data centers and servers

This is the first of a two-part exclusive on Facebook’s involvement with and creation of open source technologies. For these articles, we spoke with two of Facebook’s open source gurus, David Recordon and Amir Michael, about how the company is opening its infrastructure to other developers and organizations.

It’s one thing to open-source the code for your app — that’s a simple matter of mashing a button on Github. But how do you really open-source hardware?

Think about that: Facebook committed to open-sourcing the infrastructure of its data centers through the Open Compute Project, which launched back in April. But there’s more to maintaining an open-source project than just releasing data into the wild. You also have to accept contributions from other members of the community.

So how do you accept a patch for a motherboard? Or an improvement to a power supply?

This was just one of many challenges facing Amir Michael and the rest of Facebook’s open-source hardware team as they began redesigning the company’s servers and data centers. And to be frank, it wasn’t even the most challenging problem they’ve faced so far.

How Open Compute began

Michael, a former Googler, told VentureBeat that when he first came to Facebook, “I knew a lot about servers and data centers.” Not only did he understand the architecture of a network of servers; he even had hands-on experience as a data center tech, where he often worked until his hands were raw from repairing downed machines and replacing faulty components.

At the time Michael first came on board at Facebook, he said, “The way Facebook was scaling was tremendous. We were buying servers from HP and Dell, leasing server space from Data Realty Trust.”

But Michael had an inkling that one of the most significant companies of the decade might not actually have been handling its data in the smartest, most efficient way. “I did a little analysis,” he said. “I went to NewEgg.com and put together an equivalent server, and it was about the same price, even though we were buying in these huge volumes.

“The business model didn’t make sense.”

After realizing that Facebook wasn’t doing itself any favors buying stock servers in huge quantities, Michael started investigating how the servers were cooled and powered. “I realized there was a lot of inefficiency there, too,” he said. “We looked at how to improve it. With optimizing the data center and ignoring the servers, you get some efficiencies, and you get some efficiencies by optimizing the servers and ignoring the data centers. But you get the biggest benefits if you optimize both.”

In a nutshell, that’s how the Open Compute Project was born.

At the outset, Michael and the Facebook team tried to work with their existing hardware providers. “The vendors’ responses to the changes we wanted to make were lukewarm,” Michael said. “They offered to do a bunch of other things that weren’t too useful for us. They wanted us to buy what their other customers were using, but those modified machines weren’t as extreme as the customizations we were considering.”

Redesigning the server

From that point forward, Michael, Facebook’s manager for hardware design, started tearing apart every assumption about how servers were supposed to be built.

“We looked at why things were done the way they were, and it always came down to legacy. Challenging legacies and starting from scratch was the most innovative thing we did in the project,” he said.

For example, in Facebook’s new server design, the way power is delivered to the microprocessor is entirely different. The team took out transformations and distribution mechanisms and changed the power supply itself. Even the power cords and power strips have been entirely re-engineered, and the servers themselves were designed to be built and maintained without any tools.

In fact, Michael said the serviceability of the server was one of the team’s most important innovations. “When you have tens of thousands of servers, they break on an hourly basis. The hard drives fail, the memory fails. Our data center technicians are responsible for maintaining the servers. They spend their whole day installing new cabinets, new hard drives, etc. We wanted to make their jobs as easy as possible and a lot more efficient. We didn’t require any tools to assemble the servers, and most components are two to 10 times faster for basic service functions than on an average server.”

To test out this aspect of efficiency, Facebook had a prototype build party, which goes down in our book as one of the nerdiest ways to have fun on a Saturday night. “We let a bunch of engineers build the servers, we had pizza and beer, and we had a competition to see who could build a server the fastest,” said Michael. “A data center tech got it built in eight minutes.”

Facing resistance

When Michael was done redesigning the most fundamental aspects of the server, however, he didn’t get an initial enthusiastic response from a few key audiences. Engineers at Facebook who had to do work on the servers were “skeptical,” he said, and even the new vendors were “hesitant.” Facebook’s management took some convincing, as well.

“In general, the resistance to change, getting people to accept a new architecture, was our biggest challenge,” he said. “Getting people to be open to trying something new was hard.”

That’s especially true for big hardware changes. Making radical software changes is, by contrast, cheap and easy. “With hardware,” said Michael, “you need a lab, new hires, prototypes. It requires several million dollars worth of investment. To their credit, Facebook management’s willingness to invest in this fringe project speaks to their ability to take big risks and allow for innovation to occur.”

Those big risks involved trip after trip to Taiwan to work with new manufacturers, bringing a mechanical engineer in-house, and drafting between 50 and 60 pages of specs for the new servers. “Doing design on a white board is one thing, but figuring out the details is where you can stumble, said Michael, “especially when everything you’re doing is customized and entirely new from the ground up.”

Open-sourcing hardware

Finally, the Facebook team is still trying to figure out how to make the Open Compute Project truly open source by accepting contributions from the hardware hacker community.

“A lot of the tools aren’t there yet,” said Michael. “If someone wants to make a change to one of our circuit boards, it takes hundreds of thousands of dollars to get that package. The average hacker doesn’t have that. Most of the contributions so far come from other large companies. We’re hoping to change that in the future so a guy in his garage can design a motherboard.”

Michael continued to say that with the right software, that garage hacker could be making contributions as innovative as anything coming from a lab at HP or Dell. Currently, even the software used to design hardware is prohibitively expensive. But this is code — invisible, intangible ones and zeroes — and there’s no reason it shouldn’t be free.

Facebook wants to work with software vendors on free licenses for Open Compute Project contributors. The company is also considering working with other corporations and organizations (such as governments and large universities, which have similar computing needs) to create new, open-source software programs for hardware design.

Another prohibitive aspect is prototype creation. A typical prototype server might cost between five and 10 times more to build than a production server, so even garage hackers might need to get some kind of financial backing for those projects.

The philosophy of open-source at Facebook

We asked Michael if he had any ideological qualms about being an open-source guy at a proprietary software company. “As the guy who builds the infrastructure, I’m disconnected from the software that runs the site. It’s not a dilemma I experience on a daily basis,” he said.

But he continued, “It’s natural in an environment where companies are trying to remain profitable to keep some pieces of innovation to themselves. But they also need to be able to share and engage with the community. If you think about our business model, it’s about providing a valuable service to our users. The infrastructure we use to do that wasn’t a key piece of the business model. Our advantage is the product, not the servers. It’s not a core piece of IP.”

Also, Michael said, “Engineers are social beings, too, and they like being able to talk about the things they’re passionate about. And when you share information, you get benefits. You get feedback from other people about better, cheaper ways to do things.

“If you look at how Facebook was built, it uses a lot of open-source software. We’ve contributed back a lot in the software world, but we haven’t contribued back to the hardware world yet. No one has. But if we do that, maybe other companies can use the same kind of infrastructure. They don’t have to waste energy, and they don’t have to go through the same development process we did.”

Sharing information with universities has been particularly fruitful, Michael told us. “They have interesting solutions, but they don’t have enough data about real-world problems. They don’t know how industries operate. So by sharing information about our workloads and configurations, we get a lot of interest from universities.”

“Then there’s the environmental impact,” he said. “If we share these best practices, we’re hoping that other people can adopt it and have an impact on the environment as well.”

DevBeatCheck out DevBeat, VentureBeat’s brand new channel specifically for developers. The channel will break relevant news and provide insightful commentary aimed to assist developers. DevBeat is sponsored by the Intel AppUp developer program.