To further strengthen our commitment to providing industry-leading coverage of data technology, VentureBeat is excited to welcome Andrew Brust and Tony Baer as regular contributors. Watch for their articles in the Data Pipeline.

A common pattern in analytic ecosystems today sees data produced in different areas of the business pushed to a central location. The data flows into data lakes and is cordoned in data warehouses, managed by IT personnel. The original producers of the data, often subject-matter experts within the business domain, effectively lose control or become layers removed from data meaningful to their work. This separation diminishes the data’s value over time, with data diverted away from its business consumers. Imagine a new model that flips this ecosystem on its head by breaking down barriers and applying common standards everywhere. 

Consider an analytics stack that could be deployed within a business domain; it remains there, owned by team members in that business domain, but centrally operated and supported by IT. What if all data products generated there were completely managed within that domain? What if other business teams could simply subscribe to those data products, or get API access to them? An organizational pattern —data mesh — that promotes this decentralization of data product ownership has received a great deal of attention recently. However, what ecosystem architectures are well suited to providing the technical backbone for enabling a data mesh, and can deal with the emerging patterns of data growth?

As data volumes grow, the idea of moving data to a centralized location for processing becomes more expensive and time-consuming — particularly if that data is generated outside a traditional data center or public cloud. Instead, enterprises will increasingly prefer to deploy analytics processing to the places where the data is generated. The ability to easily geolocate data for latency, compliance or security reasons will transform the way we compute to a more sustainable, efficient, and logical reality—that is the territory of the distributed data cloud. Seamlessly controlling data anywhere is how enterprises take advantage of the incredible data growth that’s upon us. 

The distributed data cloud is not a single tool or platform, but an ecosystem pattern that gets data to the right place and the right person at the right time in a secure, governed, and trusted way. It includes a federated collection of data management and analytics services spanning public clouds, private clouds, and the network edge. 

Managed from a single control plane, a distributed data cloud enables analytic applications to be provisioned at the point of need on a right-sized blend of physical and virtualized infrastructure, based on data gravity, data sovereignty, data governance, and latency requirements. 

Several major trends will drive businesses to embrace the full value of their data with this model, where infrastructure functions to democratize data, not imprison it.   

Edge Computing Strains Internet Capacity

It’s reliably predicted that by 2025, 75% of enterprise-generated data will be created and processed outside the traditional centralized data center or cloud, up from less than 10% in 2019. The explosion of data and devices at the edge and the rollout of 5G and planning for 6G — 100 Gbps networks over the next 10 years — has hastened the realization that the internet backbone doesn’t have enough capacity to backhaul all of the data activity at the edge over to centralized data centers for analysis.   

Distributed Cloud Answers Hybrid Drawbacks

The Gartner Top Strategic Technology Trends for 2021 report suggests that the distributed cloud—the necessary infrastructure as a service precursor of a distributed data cloud platform implementation explained in this article—is emerging to address location-impacted latency. The deployment of cloud software and hardware stacks outside a public cloud provider’s data center to provide a mesh of interconnected cloud resources is what’s meant by distributed cloud. Its stacks allow businesses to run applications developed for the public cloud in a company’s own data center and other locations, like multi-access edge computing centers connected to 5G cell tower groups, or on the factory floor in support of IoT applications in manufacturing. But enterprises still benefit from the value proposition of public cloud and guaranteed SLAs.

Both hybrid cloud and hybrid IT break the fundamental value propositions of cloud. Namely, hybrid is very difficult to execute efficiently, fully leveraging the scale and elasticity of services that public cloud offers. Hybrid does not yield efficiencies in cloud operations, governance, and updates that public cloud offers, nor do these systems keep pace with innovation in public clouds. Distributed cloud means the same, seamless cloud experience everywhere. 

Mobile and Multi-Experience Hyper-Personalize Business

Enterprises ultimately want to put interactive, predictive analytics in the hands of the actual consumer. To that end, rather than data warehouses serving a user community of a thousand, data warehouses ultimately will serve a user community of millions of end consumers. The current ubiquity of mobile device usage gives an idea of where multi-sensory, multi-device, multi-touchpoint enterprise experiences with data are heading. The computer is fast becoming the environment around the user.  

An increasingly API-driven culture everywhere, seamless UX/UI, and democratized data access throughout enterprises will power the shift toward hyper-personalized, real-time interactions among people, places, and things.

Among the first use cases

With these trends spurring the advent of the distributed data cloud, several use cases are on the immediate horizon.  

First, there’s a widespread need for simplified hybrid and multi-cloud operations that feature a consistent environment in the public cloud, on-premises, and at the edge. A compelling reason for this, particularly in regulated industries such as banking, is to help reduce cloud concentration risk by distributing data and analytics across more than one cloud provider or data center. To achieve this using a distributed data cloud, an enterprise can provision containerized data management and analytics applications and run them anywhere that Kubernetes is deployed — in a public cloud, on-premises, or at the edge. Everything happens via the same management UX and devops processes and from the same web console and APIs.

Second, processing personal identifiable information (PII) in a country of residence is a scenario where localized access and regulatory compliance make moving compute to the data the best solution. Running an instance optimized for distributed data cloud in individual hospitals on a public cloud stack co-located with the hospital allows patient data to remain at the source.

A third use case where the need is already skyrocketing involves IoT analytics. The ability to perform secure analytics at the network edge and close to consumers via a distributed data cloud means real-time answers for connected cars, smart cities, energy grids, and much more. Running optimized analytics on AWS Wavelength, for example, in a multi-access edge environment to monitor network quality in real time will be entirely doable.

Bringing to life a distributed data cloud — where data anywhere is easily managed and put to work — is not a single-vendor play and it probably never will be. Rather, a consortium of companies that come together around this idea and work symbiotically will bring the party to the data and success to businesses ready to grasp a more logical future.

Mark Cusack is the CTO at Yellowbrick

DataDecisionMakers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read More From DataDecisionMakers

Author
Topics