IT and network security face a lot of struggles when migrating or deploying a workload to a public cloud. To be successful in today’s complex environments, each side must understand how the current challenges came about in order to solve them. Having lived on both sides of the fence, I offer stories from each perspective, along with lessons that cloud security pros can build from today.
Prequel: Harmony in the DMZ
IT and network security has always been a challenging space. Balancing budget, cost, app performance, app time-to-market and system stability has always been both a major headache and a delicate balancing act. Not to mention the constantly evolving threatscape, where crime does pay and the bad guys get all the coolest weapons first. To top it off, a major failure can be a “resume generating event” for many.
Yet, despite all these challenges, by the early 2010s the industry had stabilized into a mature, effective space, due to factors like Moore’s law, virtualization, vendor maturity and enterprise-grade cryptography. Back then, the DMZ was king. Here, internet access could be easily controlled, critical apps and infrastructure were locked up tight, uniform policy had a strong anchor point, and centralized visibility was all but guaranteed.
Under the iron shield of the DMZ, the industry prospered, ecommerce and mobility exploded, and people finally trusted the internet as the best way to do business. The golden age of the internet was born. Sure, the hackers were out there, and cybercrime was on the rise, but a well-designed DMZ could withstand all but the most sophisticated attacks. The future was bright, and IT security was one of the hottest jobs out there.
Part 1: The cloud revolution, an earthquake in the network
Lost in the euphoria and digital gold rush of the early 2000s another revolution was brewing. Soon people in my inner IT circles started talking about how service delivery was changing, growing, driven by the influx of investment, talent and technology that seemed to be everywhere. They called it “the cloud.” On-demand compute, driven by code and credit cards. Pay-by-the-day storage. Turnkey services that run in your browser or on your smartphone. Crazy talk.
At that point, I was at F5 Networks and many were skeptical. I was not. Microsoft was my dedicated customer at the time, and they had a special fleet of load balancers that were in use by a new business unit called Azure. I remember that the Azure control plane kept tipping over our boxes. Management interface flapping, management CPU pegged. We had never seen this kind of thing before; it was like they were doing a self-inflicted denial-of-service via change control.
Clearly Microsoft was onto something here. Automating the network and the entire platform was a game-changer. And the fact that they were tipping our boxes was bad. Really bad. It meant we weren’t equipped for this sort of thing.
“No, man, look how many calls are coming in, this is insane,” I remember a team member saying. “Nobody needs to push this many changes that fast. Don’t they know how to run a data center?”
And there it was. The jarring paradigm shift between a decade of networking best-practice and the new revolution of infrastructure as code, that would grow to shape the entire industry. Was this the first disagreement in the wild over traditional networking vs. infra-as-code? Unlikely, but it was certainly predictive of a much bigger issue that many would face as they moved to cloud years later.
Most industry people around 2010 could not rationalize why anyone would want to change the network that often. Networks at the time were meant to be immutable and perpetual. And your network security model depended on this bedrock of stability. To change the network structure on a daily or even hourly basis meant that the DMZ would be constantly under threat. Production firewall changes were done twice a year, in the middle of the night, everyone dressed in surgical gear, sweating. Changing the firewall every day? It was unthinkable.
And here is the beating heart of our first network security lesson: The legacy model of the DMZ is a failure in the cloud.
Part 2: Virtual firewalls, a trial by fire
Most IT security pros are not well-prepared for best-practice design in the cloud. They are doing what they know how to do best, what has served them well for years: they are improvising. This creates hit-or-miss situations with design that can come back to haunt them.
Even the existing courses and certifications for cloud networking, which are expensive and time-consuming, don’t cover all angles or cases, and there is precious little talent in cloud network design to go around. Until skill gaps are addressed, the typical experience will be trial by fire.
I saw this firsthand in the late 2010s at Microsoft Azure. A large global customer kept suffering rolling outages on their virtual firewalls. The CPUs on their firewalls were fine, hovering under 40%. But the flow tables of the virtual NICs were full and they were dropping packets, and the customer was both lost and frustrated.
The problem, it turned out, was the customer didn’t realize that Azure flow tables — the connection tracking mechanism within the NIC — share the same flow limits because all virtual NICs in Azure run the same code. Flow table behavior is globally uniform. It has to be, because the control plane cannot tolerate differences in the stack at hyperscale. Cloud NICs are intelligent, they do all the native security, routing, and connection processing. But each NIC has limits because the hypervisor under them has limits.
“So what do we do?” the customer asked. “We tried building bigger VMs, but it didn’t work!”
More, not bigger
Therein was the problem. The solution wasn’t to build bigger VMs, but more of them. Thin and wide. You have to spread flows across lots of medium-sized virtual instances. The trouble here is that their vendor supports VM scale sets. They had to build each firewall individually. This takes hours and really was not a tenable model.
The bitter irony that one of the masters of industrial automation doesn’t automate their cloud security struck me first as funny, then as a profound revelation that has stuck with me forever.
This is how one scales in cloud, but it was a bitter pill for that customer to swallow. The right thing and the easy thing rarely overlap. The frustrating thing for that customer (and many others) was that cloud was supposed to be the easy thing.
Cloud security: beyond legacy firewall design
And here is the beating heart of our second and third IT network security lessons: The legacy model of firewall design is a failure in the cloud, and few know how to do it right.
With the benefit of historical context, a story can take on new meaning. I hope these backstories can set the scene within your organization and provide a fresh perspective to kickstart your journey in cloud security. Best of luck and happy hunting!
Bryan Woodworth is principal solutions strategist at Aviatrix.
Welcome to the VentureBeat community!
Our guest posting program is where technical experts share insights and provide neutral, non-vested deep dives on AI, data infrastructure, cybersecurity and other cutting-edge technologies shaping the future of enterprise.
Read more from our guest post program — and check out our guidelines if you’re interested in contributing an article of your own!
