VentureBeat presents: AI Unleashed - An exclusive executive event for enterprise data leaders. Network and learn with industry peers. Learn More
Contrary to popular belief, the most meaningful developments in contemporary data architecture aren’t the rising interest in the concepts of the data mesh or the data fabric.
It’s actually the merging of these two architectural approaches into a single architecture that supports both decentralization and centralization, local data ownership and universal accessibility and top-down and bottom-up methods for creating these advantages.
The reality is that the similarities between the architectures for a data mesh and data fabric are almost greater than their differences. They’re not competing constructs for making data available across (and between) organizations; they’re actually quite complementary in achieving this objective. When properly implemented with semantic knowledge graph technologies, they fuse into a two-tiered approach for devising reusable data products that span both business domains and the enterprise.
Top-down and bottom-up methodologies combined
In fact, many of the core ideas advocated by data mesh supporters are embraced by data fabric proponents.
An exclusive invite-only evening of insights and networking, designed for senior enterprise executives overseeing data stacks and strategies.
The data mesh concept is simply a bottom-up philosophy for devolving responsibility for data to respective business units or domains while deemphasizing centralized infrastructure like data warehouses. A data fabric is a top-down methodology for devolving responsibilities for datasets closer to where the data are actually produced and is purported to utilize artificial intelligence (AI) to ‘magically’ integrate data for a centralized version of the truth.
However, both data mesh and data fabric architectures are needed. At a higher level, a data fabric can join (across an organization) the data products of a data mesh, which locally exist at a lower level. When those data assets are well described via semantic technologies, organizations can unify these architectures while reducing costs, time to value and ETL (exact, transform, load) and ELT (extract, load, transform) utilization — while also increasing their capacity to exploit data relationships.
It’s almost impossible to implement a data fabric without using data mesh ideas and techniques. A data mesh localizes data management duties to business groups instead of combining them across domains in centralized options like data lakes and data lake houses.
Data fabrics do the same thing; making one doesn’t involve centralizing everything into a single data warehouse, for example. To the contrary, it requires sourcing data from respective places, implementing Service Level Agreements (SLAs) for the data, establishing domain experts for sources, then having them formalize metadata for the data so that datasets are clean, reliable and reusable. Anyone familiar with a data mesh realizes implementing one involves those same tasks.
Data mesh supporters call these curated datasets data products. The output of a data fabric is a data product too, albeit it one at a higher level existent across an organization (instead of across a business unit).
For instance, a company may want to make SAP a source for its data fabric. The data owners for that source will make those data reusable so they’re available to the rest of the organization, but expose the data where it makes the most sense while retaining control over those assets. Data mesh adherents could (and usually do) advocate the same things for their sources.
Implementing with semantic knowledge graphs
The semantic technologies underpinning RDF knowledge graphs are primed for data mesh and data fabric architectures — and their synthesis. They’re certainly ideal for crafting data products. Semantic technologies excel at providing uniform, standards-based descriptions of data assets or products in business-friendly terminology designed for understanding and sharing them between users, systems and applications.
The crux of semantic technology is focused on sharing models of a particular domain. Experts can create these technologies so that they can be reused by anyone requiring that data product — regardless if that’s for a data mesh or data fabric. Plus, this technology readily supports combining data products to make ones for emergent use cases, like connecting data from different domains for a data fabric. Doing so could be as simple as combining knowledge graphs from individual domains.
Simultaneously, semantic knowledge graph technology is optimal for implementing data fabrics. This architecture entails integrating data from a plethora of sources, data types, schema and other points of differentiation. Subsequently, the resulting models become more intricate, necessitating technologies to accommodate complex relationships and descriptions for connecting those data. Semantic knowledge graphs fulfill this obligation at a higher level of abstraction (that’s further away from the sources) that’s necessary for stitching together a data fabric.
The two-tiered architecture
Conceptually, a better way to think about the data fabric and data mesh architectures is as two tiers of a common architecture. For the first tier, a data mesh is the bottom-up approach nearest the data sources. This tier is responsible for provisioning the data, which are described with rich metadata according to semantic standards to produce reusable data products from individual business domains.
The goal is to make these localized descriptions meaningful and accessible to others across the enterprise. Semantic technologies accomplish this goal with standards for RDF, OWL and taxonomies, so datasets are readily understood by the business.
The data fabric is the top-down approach to the second or upper tier above the data mesh. As such, it integrates any data product across domains, locations and datasets. This construct is pertinent for devising new data products by combining them across domains. As such, a data fabric encompasses all business domains while still retaining the meaning of the parochial business ownership of those data assets. Therefore, organizations benefit from the best elements of each architecture combined in one.
AI’s capabilities for automating the necessary data integration implicit to the data fabric architecture — and its unification with the data mesh one — have been highly exaggerated.
For data integration, AI’s functionality is still somewhat limited. Data fabric supporters claim this construct can automate data integration via metadata, which is typically involved in prudent data integrations. However, integration processes today revolve around the actual data as much as they do metadata. AI certainly has some utility in integrating data for data fabrics. But the scale, complexities and numerous distinctions between data in integration processes still require human effort in addition to machine automation.
A more practical use of AI is in automating the creation of knowledge graphs that describe data for the said two-tiered approach for unifying data mesh and data fabric architectures. There are numerous AI techniques for identifying connections in datasets and making intelligent suggestions about them to accelerate the population of a knowledge graph for a domain. Examples of inference techniques include approaches like semantic inferencing, in which self-describing statements about data are combined to devise new ones.
There are also reasoning approaches typified by symbolic reasoning and OWL-based reasoning. Germane unsupervised learning techniques include varying means of dimensionality reduction and clustering. Supervised learning applications include link predictions, which can be spurred by graph neural networks. There is an abundance of techniques for entity resolution to determine if an entity in one dataset is the same as or related to another entity in another dataset. Increasingly, these techniques are relying on AI.
Benefits: Direct and corollary
The coalescence of the data mesh and data fabric constructs into a single, binary-tiered architecture powered by semantic knowledge graphs yields distinct enterprise advantages. It greatly reduces the amount of ETL and ELT processing required for transforming data.
Well-described semantically tagged data is inherently reusable and doesn’t require additional transformation for reusing it. Semantic technologies make data self-describing in business terminology, so once domain experts introduce those descriptions as a model, they can be endlessly reused within and across domains.
Decreased cost is another tangible benefit of this methodology and a corollary of the first benefit. Because semantic data are reusable, organizations spend less on cleansing raw data and wrangling those data into a desired form. The costs of mapping, cleansing and normalizing raw data are considerable; with semantics, this process can be done once and endlessly reap benefits — which adds up when combining data products across domains at the data fabric level.
There are also temporal boons for the reduced time-to-value of this approach, in which less time preparing data means faster access to analytics, insights and resulting business action. There’s also a heightened capacity to ascertain, manage and interconnect relationships among disparate datasets. This benefit ensures far better understanding of data’s importance for data discovery and data exploration, which drastically enhances analytics and the value reaped from it.
A symbiotic relationship
The data mesh and data fabric concepts work well together to fulfill similar objectives. They localize responsibility for data to business units without conventional centralization methods, creating curated, reusable data products across an organization. A data mesh incorporates a bottom-up approach to this task, while a data fabric utilizes a top-down one.
Uniting these approaches into a single architecture creates a symbiosis for the best outcome — particularly when their implementations are streamlined and their efficacy enhanced by the rich, self-describing nature of semantic knowledge graph technologies.
Sean Martin is CTO of Cambridge Semantics.
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.
If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing an article of your own!