Join Transform 2021 this July 12-16. Register for the AI event of the year.

You’ve probably heard of “data gravity” and how it can inhibit a hybrid strategy. The idea is that, as you amass and store data in one particular cloud, this body of data exerts a gravitational pull on the apps and services that orbit around it, making it impossible for you to then move that data to another cloud. But data gravity doesn’t have to stymie an organization from adopting a multicloud or hybrid-cloud strategy. In fact, the opposite is true.

In the oft-used analogy, if compute infrastructure is the machinery of today’s world, then data is the oil — meaning infrastructure is not productive without it. It does not make sense for applications and services to run where they do not have quick and easy access to data, which is why data exerts such a gravitational pull on them. When services and applications are closer to the data they need, users experience lower latency and applications experience higher throughput, leading to more useful and reliable applications.

Simplistically, one could be tempted to locate all data, and the applications within its orbit, in a single location. But regulatory and latency concerns are two reasons why this is not realistic for most global enterprises.

A single public cloud is a pipedream

The idea that a single public cloud will solve all of your problems is a pipedream that no organization can realistically make work (nor would they want to). Sure, it may sound easier in theory to work with only one vendor, with only one bill to pay and the same underlying infrastructure for all of your applications. But between the demand for edge computing, the need to comply with data sovereignty regulations and the general need to be nimble and flexible, one cloud for all of your data is just not practical for a business to compete in today’s market.

It’s true that some processing is best done in a global central location — model training for artificial intelligence and machine learning (AI/ML) applications, for example, thrive on having massive amounts of data to analyze because it increases model accuracy. However, the inference of AI/ML applications frequently can’t be done at the core and needs to be at the edge instead. For example, a manufacturer in Thailand relying on data generated from heat sensors on the factory floor needs to be able to analyze that data in real time in order for it to have value. That decision-making needs to be done close to where the data is generated to meet the business requirements and make an impact on operations.

The challenges of data gravity

One of the most obvious challenges of data gravity is vendor lock-in. As you amass more data in one location, and more of your apps and services rely on that data, it becomes increasingly difficult, not to mention costly, to move out of that original location.

In our Thai factory edge example, some apps must move to where data is generated in order to meet latency requirements. Latency is essentially the time budget you have available to process the data for it to have an impact or be needed by the end user. Where the data is located must be within the latency budget for it to be useful. When apps and data are separated by too great of a latency time budget, the corresponding reduction in responsiveness can greatly hinder your organization.

Take for instance, a smart cities application such as license plate recognition for border control. Once a license plate is scanned, some apps must produce a near real-time response to fit within the latency time budget (Amber alerts, stolen vehicles, etc.). If the latency for this analysis exceeds the latency time budget, the data is much less meaningful. If the data and apps are too far away from each other, the scanning becomes useless.

Beyond the business requirements and expectations for quick response times, data sovereignty laws also regulate where data can be stored and whether it can move across jurisdictional boundaries. In many countries it’s against the law to export certain types of data beyond the borders of the country. The average Global 2000 company operates in 13 countries. If that is the reality for your organization, you can’t easily move data while abiding by those laws. If you try to take the time to anonymize that data to meet sovereignty requirements, your latency budget goes out the window, making it a no-win situation.

Address data gravity with the hybrid cloud

Inevitably, wherever you store data, it will pull apps and services to it. But data sovereignty and latency time budgets for edge applications almost guarantee that multinational companies cannot operate with a simple single-cloud strategy.

With a hybrid cloud infrastructure, organizations can spread out apps and services to where their data is, to be closer to where they need it, addressing any latency problems and data sovereignty requirements. The key to making this work is to use a common operating environment across these various clouds and data center locations, such as a Kubernetes platform. If your organization is maintaining applications for many different operating environments, the associated complexity and costs may kill you competitively.

Organizations can use a mix of AWS, Azure, Google Cloud Platform, VMware, on-premises, and more — but there needs to be a way to make the apps and services portable between them. With a common operating system, you can write applications once, run them where it makes sense, and manage your whole environment from one console.

Setting up for success now

In the coming years, datasets are only going to continue to grow (exponentially), especially as organizations depend increasingly on AI and machine learning applications. According to IDC, worldwide data creation will grow to an enormous 163 zettabytes by 2025. That’s ten times the amount of data produced in 2017.

That means any challenges brought on by the gravitational pull of the data you use are only going to be exacerbated if you don’t set yourself up for success now. With a hybrid cloud infrastructure, you can set up a constellation of data masses to comply with increasing global data sovereignty laws and process and analyze data in edge locations that make sense for the business and end users.

Brent Compton is Senior Director of Data Services and Cloud Storage at Red Hat.


VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more
Become a member