We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
In The Cost of Cloud, a Trillion-Dollar Paradox, Andreessen Horowitz Capital Management’s Sarah Wang and Martin Casado highlighted the case of Dropbox closing down its public cloud deployments and returning to the datacenter. Wang and Casado extrapolated the fact that Dropbox and other enterprises realized savings of 50% or more by bailing on some or all of their cloud deployments in the wider cloud-consuming ecosphere. Wang and Casado’s conclusion? Public cloud is more than doubling infrastructure costs for most enterprises relative to legacy data center environments.
Unfortunately, the article contains a number of common misconceptions. As practitioners supporting over 800 cloud environments, we see deployments at every stage of life — from as early as the architecture (planning) phase all the way through to long-duration deployments that have already been subjected to multiple rounds of carefully targeted optimization. In our view, a generalized debate over whether on-prem environments are cheaper to operate than cloud is incredibly simplistic.
Well-architected and well-operated cloud deployments will be highly successful compared to datacenter deployments in most cases. However, “highly successful” may or may not mean less expensive. A singular comparison between the cost of cloud versus the cost of a datacenter shouldn’t be made as an isolated analysis. Instead, it’s important to analyze the differential ROI of one set of costs versus the alternative. While this is true for any expenditure, it’s doubly true for public cloud, since migration can have profound impacts on revenue. Indeed, the major benefits of the cloud are often related to revenue, not cost.
Two common examples of the cloud’s ability to enhance revenue:
- Acceleration of time-to-market cycles
- The possibility of rapid expansions in infrastructure (within or even across geographies) to capture revenue blooms
The revenue enhancements associated with both can exceed any theoretical cost premiums for cloud by significant amounts, resulting in very attractive returns on investment when these technologies are applied well.
Short-term thinking brings short-term results
An oversimplified counterexample to Wang and Casado’s assertions will make our logic clear. Suppose a private equity firm approaches a manufacturing concern and advises them that they can cut their cost of revenue metric in half by shuttering half of their factory. What happens to production volumes if they follow this advice? What happens to revenue? If the plant was running at or near capacity, their production capacity — and therefore their revenue — would also be cut in half. Now imagine the half of the factory they closed actually had the most productive assembly lines. Their costs have dropped by half, but their revenue will drop by more. This approach may result in some favorable near-term financial results, but investors with longer-term goals are going to take it on the chin down the road when revenue collapses. If an enterprise bails on the cloud to save costs, how might their time-to-market or revenue elasticity be impacted? What opportunities would be foregone? These dynamics must be considered, and that means analyzing ROI, not isolated metrics like cost of sales or cost of goods sold.
The Dropbox repatriation: statistical cherry-picking
What’s more, by extrapolating the results of successful repatriations to the wider ecosphere of cloud consumers, the authors take entirely too many liberties with the notion that one cloud deployment can be easily compared with another from a cost perspective.
The true “cost” of a public cloud is a function of:
- The appropriateness of cloud for specific workloads
- The architecture
- Efficient operation
By definition, the cloud deployments that were successfully repatriated failed along some or all of these dimensions, as directly evidenced by their successful repatriations. But even in cases where the repatriations were deemed successful, it is hardly certain that repatriation was the best option. For example, if a cloud deployment was poorly architected and/or based almost entirely on lift-and-shift workloads, could those workloads have been refactored to cloud-native instead of returned to a datacenter? We have seen savings of 90% and more in such cases. To extrapolate the “savings realized” in “successful” repatriations cases to the wider universe of cloud consumers and thereby conclude that most or all cloud deployments are equivalent failures represents a wholesale backfire of logic. The fact that these deployments were poorly architected or were better-suited to run on-prem hardly means that all cloud workloads are. If the majority of cloud deployments resulted in outcomes this unfavorable, the stampede to the cloud would not have begun and would not be continuing today.
Don’t worry, you’re not wasting more than half of your infrastructure spend
For modern enterprises, the question is not “cloud versus datacenter” but “which workloads for cloud, which workloads for datacenter?” The process steps for analyzing this decision involve asking the following questions:
- Which workloads benefit from the elasticity, geo-flexibility, or technological innovation cloud offers? Which workloads can really “take off” if migrated or currently rely on innovative new services only offered in the cloud? These are the best candidates to be run on a public cloud.
- Are current or planned workloads architectured to use cloud-native technologies where possible, or are they lifted-and-shifted clones of datacenter infrastructure? If they can be cloned 1:1 in a datacenter, then companies should always consider re-architecting the workload to take advantage of cloud-native technologies. For example, you can move your Hadoop to cloud as is, but we’ve seen identical queries run in BigQuery 73x faster. You could keep running on VMs, but you could save 60% by refactoring into containers. You could stay with your teraflops on CPU, but you can get an exaflop (yes, that’s 1,000,000x faster) on TPUv4.
- Is the ROI of infrastructure spend in the cloud being measured and compared to a model of the same infrastructure costs on-prem? And vice versa? Regular validation should be carried out to verify that the correct mix of on-prem and public cloud workloads is being maintained. Critically, the ROI analysis must factor revenue opportunity costs of one alternative over the other. For example, if a workload is being considered for repatriation, the model must factor the revenue degradation that would be imposed by eliminating the cloud’s elasticity and thereby slowing time to market, causing stock-outs instead of capitalizing on revenue blooms, etc.
- Are best-in-class practices for operating public cloud infrastructure being followed? Has a well-trained and equipped FinOps team been established?
If you’re running large workloads in the public cloud, it’s not time to panic. It’s highly unlikely you are wasting half or two-thirds of your infrastructure costs by running in the cloud without any incremental benefits to show for it. By following the guidelines above, you can ensure that both your cloud and on-prem deployments are successful, without bailing out of one or the other as a result of tunnel vision on cost alone.
As the director of the FinOps group at SADA, Rich Hoyer develops and delivers services designed to help clients monitor, measure and improve the value of their Google Cloud services.
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.