Data estates: Creating an architecture that’s built to last

Most of us are familiar with the concept of urban sprawl. If a city has a high degree of sprawl, significant time and money is spent on commuting, highways are always congested, and commuters show up to work frazzled -- which could lead to errors in their work. At the same time, the amenities available to homes and businesses may be minimal due to lower densities.

The same concept applies to data estates. Data lakes and enterprise data warehouses have been known as “slow IT platforms” that rarely serve the agility needs of the business due to an inability to manage mixed workloads. Corporations have previously been forced to sprawl out their data estates to serve the different workload and priority needs of various parts of the business because there are technical constraints from legacy data architectures.

But data sprawl, like urban sprawl, can be solved. Even though each business unit, from marketing to operations, functions independently, they each depend on high quality data that is shared among everyone within the business to serve, protect, and enrich the collective.

With the challenges that legacy data architecture presents and the emergence of new technologies, businesses must adopt a planning mindset for their data estates. Here are three ways businesses can simplify them:

1. Build a simple but comprehensive data cloud as the connective tissue across all digital systems

Adopting a data cloud approach where there is a centralized common data store, with multiple independent urban centers of insight, dramatically simplifies the data estate. Reducing sprawl requires increasing application or insight density per data platform, and organizations with high application densities can increase the amount of high value “amenities” such as automation, compliance, audit, and privacy available to each application leveraging the common stack. Organizations with higher application densities on their data platforms can unlock higher degrees of value by minimizing transmission loss, fidelity loss, and loss of trust.

2. Allow business unit independence

The introduction of a compute separate data cloud allows organizations to deploy business unit-specific compute clusters on top of an “enterprise data set”. Each line of business can have its own reporting and or deep learning cluster that runs independently of any other needs. Companies will be able to run or tweak multiple queries and workloads, which was cumbersome previously in a shared compute environment.

3. Minimize data duplication

Low application densities lead to tremendous amounts of waste. When analyzing data estates at client companies, I sometimes see large portions of their data across platforms originating from the same sources -- meaning that centralizing data storage while providing computing independence could save high volumes of data from getting duplicated many times. Reducing data duplication reduces storage needs and lowering copies and variants of the same data increases data trust tremendously. Data duplication leads to wasted infrastructure and labor that could be redirected to create positive business outcomes instead of wrangling data.

Just like with urban sprawl, replanning a data estate to reduce sprawl requires a clear roadmap and well-executed approach. Adopting data clouds, where the data and computing are separate, while reducing data estate sprawl and increasing application density per platform can improve business outcomes across all levels of the organization. A lower data estate footprint is not only good for businesses and corporations, but also supports an organization’s sustainability goals.

Goutham Belliappa is VP of AI Engineering at Capgemini North America.

1. Build a simple but comprehensive data cloud as the connective tissue across all digital systems

2. Allow business unit independence

3. Minimize data duplication

More