Head over to our on-demand library to view sessions from VB Transform 2023. Register Here

IT and network security face a lot of struggles when migrating or deploying a workload to a public cloud. To be successful in today’s complex environments, each side must understand how the current challenges came about in order to solve them. Having lived on both sides of the fence, I offer stories from each perspective, along with lessons that cloud security pros can build from today.

Prequel: Harmony in the DMZ

IT and network security has always been a challenging space. Balancing budget, cost, app performance, app time-to-market and system stability has always been both a major headache and a delicate balancing act. Not to mention the constantly evolving threatscape, where crime does pay and the bad guys get all the coolest weapons first. To top it off, a major failure can be a “resume generating event” for many.

Yet, despite all these challenges, by the early 2010s the industry had stabilized into a mature, effective space, due to factors like Moore’s law, virtualization, vendor maturity and enterprise-grade cryptography. Back then, the DMZ was king. Here, internet access could be easily controlled, critical apps and infrastructure were locked up tight, uniform policy had a strong anchor point, and centralized visibility was all but guaranteed.

Under the iron shield of the DMZ, the industry prospered, ecommerce and mobility exploded, and people finally trusted the internet as the best way to do business. The golden age of the internet was born. Sure, the hackers were out there, and cybercrime was on the rise, but a well-designed DMZ could withstand all but the most sophisticated attacks. The future was bright, and IT security was one of the hottest jobs out there.


VB Transform 2023 On-Demand

Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.


Register Now

Part 1: The cloud revolution, an earthquake in the network

Lost in the euphoria and digital gold rush of the early 2000s another revolution was brewing. Soon people in my inner IT circles started talking about how service delivery was changing, growing, driven by the influx of investment, talent and technology that seemed to be everywhere. They called it “the cloud.” On-demand compute, driven by code and credit cards. Pay-by-the-day storage. Turnkey services that run in your browser or on your smartphone. Crazy talk.

At that point, I was at F5 Networks and many were skeptical. I was not. Microsoft was my dedicated customer at the time, and they had a special fleet of load balancers that were in use by a new business unit called Azure. I remember that the Azure control plane kept tipping over our boxes. Management interface flapping, management CPU pegged. We had never seen this kind of thing before; it was like they were doing a self-inflicted denial-of-service via change control.  

Clearly Microsoft was onto something here. Automating the network and the entire platform was a game-changer. And the fact that they were tipping our boxes was bad. Really bad. It meant we weren’t equipped for this sort of thing.

“No, man, look how many calls are coming in, this is insane,” I remember a team member saying. “Nobody needs to push this many changes that fast. Don’t they know how to run a data center?

And there it was. The jarring paradigm shift between a decade of networking best-practice and the new revolution of infrastructure as code, that would grow to shape the entire industry. Was this the first disagreement in the wild over traditional networking vs. infra-as-code? Unlikely, but it was certainly predictive of a much bigger issue that many would face as they moved to cloud years later.

Most industry people around 2010 could not rationalize why anyone would want to change the network that often. Networks at the time were meant to be immutable and perpetual. And your network security model depended on this bedrock of stability. To change the network structure on a daily or even hourly basis meant that the DMZ would be constantly under threat. Production firewall changes were done twice a year, in the middle of the night, everyone dressed in surgical gear, sweating. Changing the firewall every day? It was unthinkable.

And here is the beating heart of our first network security lesson: The legacy model of the DMZ is a failure in the cloud.

  1. Cloud networks are constantly changing. Sometimes daily. Managing your virtual firewall in the cloud using traditional methods creates a high touch environment, risks misconfigurations and can lead to legacy rule bloat.
  2. Firewall network ACLs do not always map to real or present actors; containers, PaaS, and SaaS workloads are named-based. Consider using cloud-native tags to classify and govern cloud apps and workloads. 
  3. It is better to use cloud-native security stacks, or better yet, a good orchestrator for them, whenever possible. This is because they are distributed, free, close to each VM, and, with the right approach, programmable and agile.
  4. Focus on cloud-native stacks for intra-VPC/VNet traffic. Avoid pulling intra-VPC/VNet traffic into a virtual firewall. Remember, VPCs and VNets are logical constructs. Physical distance (read: latency) between VMs is a factor of things like availability zones and proximity placement groups, not membership in a VPC/VNet.  
  5. Reserve your virtual firewall only for inter-spoke East-West traffic and/or North-South traffic. Most network changes happen within the VPC/VNet spoke (if designed correctly) which can protect your firewall from being subjected to constant configuration change.
  6. Don’t build too many VPC/VNet spokes. Try to make them large enough to handle an entire LOB or application. Use subnets and subnet-level security groups between application tiers. Only break an application tier off to its own VPC/VNet if you must do firewall inspection between the tiers, which is an anti-pattern in cloud.
  7. Strongly avoid the temptation to force your default route from internet back to your on-prem DMZ. This introduces latency, can complicate VM-to-PaaS/SaaS architectures and can fill private pipes. Cloud networks are now the largest private networks in the world. Use them.
  8. When building a virtual DMZ, you must have pristine route control. Routes are your VLANs in cloud. Look for platforms that provide comprehensive route control in the cloud and automate routing to your firewalls. You will never keep up with cloud, or the industry, if you have to make static route changes each time a VPC or VNet is born.
  9. Static route summarization to the virtual DMZ will work great, right up to your first IP overlap with a B2B partner, your first M&A event or your first multi-cloud deployment. Look for platforms that support Enterprise NAT at scale in cloud. IP overlap can create pain points in your cloud security design.

Part 2: Virtual firewalls, a trial by fire

Most IT security pros are not well-prepared for best-practice design in the cloud. They are doing what they know how to do best, what has served them well for years: they are improvising. This creates hit-or-miss situations with design that can come back to haunt them.

Even the existing courses and certifications for cloud networking, which are expensive and time-consuming, don’t cover all angles or cases, and there is precious little talent in cloud network design to go around. Until skill gaps are addressed, the typical experience will be trial by fire.

I saw this firsthand in the late 2010s at Microsoft Azure. A large global customer kept suffering rolling outages on their virtual firewalls. The CPUs on their firewalls were fine, hovering under 40%. But the flow tables of the virtual NICs were full and they were dropping packets, and the customer was both lost and frustrated.

The problem, it turned out, was the customer didn’t realize that Azure flow tables — the connection tracking mechanism within the NIC — share the same flow limits because all virtual NICs in Azure run the same code. Flow table behavior is globally uniform. It has to be, because the control plane cannot tolerate differences in the stack at hyperscale. Cloud NICs are intelligent, they do all the native security, routing, and connection processing. But each NIC has limits because the hypervisor under them has limits.

“So what do we do?” the customer asked. “We tried building bigger VMs, but it didn’t work!”

More, not bigger

Therein was the problem. The solution wasn’t to build bigger VMs, but more of them. Thin and wide. You have to spread flows across lots of medium-sized virtual instances. The trouble here is that their vendor supports VM scale sets. They had to build each firewall individually. This takes hours and really was not a tenable model.

The bitter irony that one of the masters of industrial automation doesn’t automate their cloud security struck me first as funny, then as a profound revelation that has stuck with me forever. 

This is how one scales in cloud, but it was a bitter pill for that customer to swallow. The right thing and the easy thing rarely overlap. The frustrating thing for that customer (and many others) was that cloud was supposed to be the easy thing.

Cloud security: beyond legacy firewall design

And here is the beating heart of our second and third IT network security lessons: The legacy model of firewall design is a failure in the cloud, and few know how to do it right.

  1. Your virtual firewalls have no idea that they are running in a cloud. They believe they are connected to wires. They are not. They are connected to an SDN stack which might be almost as smart as your firewall.
  2. Orchestration of your firewall tier is key. Choose a platform that enables firewall orchestration and make sure you build a pipeline around this. Some platforms will do both for you under the hood.
  3. Embrace thin and wide. Unless you are trying to solve for fat flows (massive data streams, very unfriendly to cloud VMs) prepare to scale horizontally and avoid the temptation to scale vertically.
  4. Some vendors will tell you that you need multiple firewalls for multiple purposes or functions in cloud. Technically speaking, you do not:
    1. There is no performance difference between one set of four VMs and two sets of two VMs, provided they are the same kind of VMs.  
    2. Programmatic routing control in the cloud means your firewall can be both East/West and North/South at the same time. Remember the points above. Thin and wide. Horizontal scale. A core is a core. It’s all software now.
    3. If you do decide to create many firewall instances to address different use cases or locations, be sure to have a good strategy in play for policy management and Day 2 operations.
    4. You will be generating a ton of network data in the cloud — will your firewall see it all? Should it see it all? This is a challenge. Look for platforms that do data collection both within and across the entire network so that your firewall does not have to become a data hog and a single point of failure for your network eyes and ears.
  5. Consider the idea that the network itself might become the best firewall in the cloud, given that it is distributed, programmatic, low-cost and growing in capability every day. However, some critical blockers remain:
    1. Cloud networks are not very good at app-layer security.
    2. Cloud networks are difficult to orchestrate at medium to large scale.
    3. Cloud networks and security groups are wildly different between the major vendors.
    4. Cloud networks have certain limitations because they are part of a large multi-tenant platform.
  6. Look for solutions that help overcome these deficiencies with native SDN security stacks, to offer the best of both worlds. These solutions should add app-layer awareness, good multicloud orchestration and a simple policy model that abstracts differences between CSPs.

With the benefit of historical context, a story can take on new meaning. I hope these backstories can set the scene within your organization and provide a fresh perspective to kickstart your journey in cloud security. Best of luck and happy hunting!

Bryan Woodworth is principal solutions strategist at Aviatrix.


Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read More From DataDecisionMakers