Data mesh: What it is and why you should care

This article was contributed by Bruno Aziza, head of data and analytics at Google Cloud

“Data mesh" is a term that most vendors, educators, and data pundits seem to have landed on en masse to define one of the most disruptive trends of the data, AI, and analytics worlds. According to Google Trends, in 2021, "data mesh" overcame the “data lakehouse” that had, until now, been fairly popular in the industry.

Put mildly, if you work in technology, you won’t be able to escape the data mesh in 2022.

Data mesh: a simple definition

The genesis of the data mesh originates from a paper authored in May 2019 by Zhamak Dehghani. In this piece, the Thoughtworks consultant describes the limits of centralized, monolithic, and domain agnostic data platforms.

These platforms often take the form of proprietary enterprise data warehouses with "thousands of unmaintainable ETL jobs, tables, and reports that only a small group of specialized people understand, resulting in an under-realized positive impact on the business," or complex data lakes that are "operated by a central team of hyper-specialized data engineers that [have], at best, enabled pockets of R&D analytics," according to Dehghani. The latter case is often referred to as a “data swamp,” a data lake where data of all kinds stagnates, goes un-utilized, and is ultimately useless.

The data mesh intends to offer a solution to these issues by focusing on domain-driven design and guides leaders towards a “modern data stack” to achieve a balance between centralization and decentralization of metadata and data management.

One of the best explanations and implementations of the data mesh concept I’ve read to date is in L’Oréal CIO Francois Nguyen’s two-part series entitled "Toward a Data Mesh" (Part 1, Part 2).

If you haven’t read it yet, stop everything and do that now. There is no better guidance than that of practitioners who test theories into practice and report real-world findings on their data journey. Francois’ paper is full of useful guidance for how a data mesh can guide your data team’s composition and organization. “Part Deux” of his blog provides true, tested, and technical guidance on how to implement a data mesh successfully.

Remember that a data mesh is more than technical architecture; it is a way to organize yourself around data ownership and its activation. When deployed successfully, the data mesh becomes the foundation of a modern data stack that rests on six key principles. For your data mesh to work, data must be 1) discoverable, 2) addressable, 3) trustworthy, 4) self-describing, 5) inter-operable, and 6) secure.

In my opinion, a seventh dimension should be added to the data mesh concept: financially responsible and financially accurate. One of the biggest challenges (and opportunities) of a distributed and modern data stack is the true allocation of resources (and cost) to the domains.

Many will interpret this comment as a “cloud costs you more” argument. That's not what I’m referring to. In fact, I believe that cost shouldn’t be evaluated in isolation. It should be correlated with business value: if your company can get exponentially more value from data by investing in a modern (and responsible) data mesh in the cloud, then you should invest more.

The biggest issues in this field haven’t been about lack of data or lack of investment. They have been about the lack of value. According to Accenture, close to 70% of organizations still can’t get value from their data.

Don't get distracted by the hype

If your ultimate goal is to drive “business value” from data, how does the data mesh concept help you? One of your biggest challenges this year will probably be to avoid getting caught in the buzzword euphoria that surrounds the term. Instead, focus on using the data mesh as a way to get to your end goal.

There are two key concepts to consider:

The data mesh isn’t the beginning

In a recent piece, my friend Andrew Brust noted that "dispersal is operational data’s natural state" and that “the overall operational data corpus is supposed to be scattered. It got that way through optimization, not incompetence." In other words, the data you need is supposed to live in a distributed state. It will be on-premises, it will be in the cloud, it will be in multiple clouds. Ask your team: "Have we taken inventory of all the data we need? Do we understand where it all lays?"

Remember that, per the original paper by Dehghani, in order for your data mesh to work, your data needs to be “discoverable, addressable, trustworthy, self-describing, inter-operable and secure.” This presupposes that there is a stage before the data mesh stage.

I have the honor to spend a lot of time with many data leaders, and the best description I’ve heard about what that stage could be is the "data ocean" from Vodafone’s Johan Wibergh and Simon Harris. The data ocean is wider than the landlocked data lakes concept. It is focused on securely providing full visibility to the entire data estate available to data teams to realize their potential, without necessarily moving it.

The data mesh isn’t the end

Now that we’ve established that the data mesh needs a data foundation to operate successfully, let’s explore what the data mesh leads you to. If your goal is to generate value from the data, how do you materialize the results of your data mesh? This is where data products come into play.

We know that value from data comes from its usage and its application. I’m not referring to simple dashboards here. I’m referring to intelligent and rich data products that trigger actions to create value and protect your people and business. Think about anomaly detection for your networks, fraud prediction for your bank accounts, or recommendation engines that create superior customer experiences in real time.

In other words, while the data ocean is the architectural foundational required to set your data mesh up for success, the data mesh itself is the organizational model that enables your team to build data products. If every company is a "data company," its currency is the “data products” it can output, its repeatability, and its reliability. This is a concept that McKinsey Analytics coined the “data factory”.

What should you be worried about?

As you read more about the data mesh concept throughout the year, you will most likely hear from three types of people: the disciples, the distractors, and the distorters.

The disciples will encourage you to go back to the original paper or even contact Dehghani directly if you have questions. You can also order her book, which is coming out soon.

The distractors will be pundits or vendors who will want to label the concept of the “data mesh” as a fad or an old trend: “Look away!” they’ll say, “there is nothing new here!” Be careful. Newness is relative to your current state. Go back to the genesis and decide for yourself if this concept is new to you, your team, and your organization.

The distorters will likely be vendors (software, vendors, services) who will get a direct benefit from drawing a straight line from the Dehghani paper to their product, solution, or services. Watch out. As my friend Eric Broda explains in his data mesh architecture blog, “there is no single product that brings you the data mesh.”

The best solution in my opinion is to connect to the practitioners. Those leaders who have put practice to the theory and who are willing to share their learnings.

Bruno Aziza is head of data and analytics at Google Cloud.

Welcome to the VentureBeat community!

Our guest posting program is where technical experts share insights and provide neutral, non-vested deep dives on AI, data infrastructure, cybersecurity and other cutting-edge technologies shaping the future of enterprise.

Read more from our guest post program — and check out our guidelines if you’re interested in contributing an article of your own!