Head over to our on-demand library to view sessions from VB Transform 2023. Register Here
Data is critical to business success in today’s world, and a solid data management foundation is the key to taking advantage of growth opportunities. But one of the biggest challenges facing data professionals is fully understanding their organizations’ complex data estates.
Most companies are eager to apply advanced analytics and machine learning to generate insights from data at scale. Yet they are also struggling to modernize their data ecosystems. Too often, data is stored all over the place in legacy systems. Or it is too hard to find in tech stacks cobbled together through years of acquisitions.
A recent Forrester study commissioned by Capital One confirmed these challenges in seeing, understanding and using data. In a survey of data management decision-makers, nearly 80% cited a lack of data cataloging as a top challenge. Almost 75% saw a lack of data observability as a problem.
In data management, out of sight is out of mind
Data that’s out of sight doesn’t generate value for your organization. That’s why it’s so important to bring data out of the darkness and make it more visible and usable. For example, data cataloging plays a critical role in understanding data, its use and ownership. When data professionals adopt more holistic approaches to cataloging, observability and governance, they can better unlock the data’s value to improve business outcomes.
VB Transform 2023 On-Demand
Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.
Hundreds of companies provide different capabilities in data cataloging, data quality, ETL, data loading and classification. We don’t need more disruption here. We need simplification. The pain point is the complexity that data analysts and engineers face in getting specific jobs done, such as publishing, finding or trusting a dataset. Right now, that can involve going through multiple tools owned by different teams with their own required approvals.
We need a simplified experience layer so that users need only answer a few questions, and then the data is published without any backend integration. If that experience can happen seamlessly and comply with policy guidelines, working with data won’t be a burden. All kinds of great experiences will emerge, including faster time-to-market and fewer duplicative efforts within the organization.
Getting to this future state requires discipline, focused investment and buy-in from the top. Still, companies have a range of tools and approaches at their disposal to achieve a well-managed data estate that delivers real business impact and scales as data sources and products expand.
For most data leaders, the first move is migrating to the cloud. Gartner forecasts cloud end-user spending to hit $600 billion next year, up from nearly $411 billion in 2021. Companies know they can do a lot more with their data in the cloud, and it can relieve the pressure of centralized teams managing the most critical components of your data on-prem. Moving to the cloud can alleviate data bottlenecks, but the cloud also vastly increases the variety of data coming in, from far more sources, with more need to analyze it quickly. Now you’re back in a bottleneck situation and risk rising tensions between central IT and business teams.
One model I champion is to federate data management to the lines of business, with a central tool to manage costs and risks. You can let business teams move at their own pace while the central shared services team ensures the platform is well-managed and highly observable.
It’s important to consider the different ways business teams produce and use data. You need to build flexibility into the tools. If you don’t, you risk these teams finding another channel to do the work. When that happens, you lose visibility and cannot guarantee all business teams are complying with governance policies. A federated data approach with centralized tooling and policy avoids excessively centralized control, without decentralizing everything to the point where you run the possibility of cost overrun and data security risks.
Federating the data also gives data producers, consumers, risk managers and underlying platform teams a single source of truth. That’s where that simplification layer comes in again: having one place where data analysts and scientists know they can find what they need. Everyone has the same UI layer, tooling and policies, so they know they’re publishing according to the guidelines.
Last, ensure that your data analysts and scientists have a clear “productization” path out of the sandbox environment in which they did their work. If something important comes out of their analytics, you have to give them an easy way to wrap that work in the proper data governance policies while getting it into production. Otherwise, you can end up with shadowy, ungoverned pseudo-production datasets running in unstable environments.
Data is power, but it comes with great responsibility. Building data trust through greater visibility, consistency and platform simplification is a necessary foundation for creating the modern data ecosystem.
Salim Syed is VP and Head of Engineering at Capital One Software.
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.
If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing an article of your own!