How AI can ease those data management woes

Data is the new oil, but raw data is no good in and of itself. Like oil, data assets have to be gathered entirely and accurately and sent through different refining processes to create value for end users. This is the general data lifecycle — an area where artificial intelligence (AI) is going to play a major role for enterprises.

Initially, managing the data lifecycle was a task small enough to be handled manually by a team of experts. The volume of information was not that much, the sources were just a handful and the possible applications were also limited. But with the transition to the cloud, and the introduction of new sources, both the volume and diversity of data have surged.

“Data management is no longer wholly focused on relational data," Adam Ronthal, research VP in Gartner's ITL data and analytics group, told VentureBeat. "Document, graph, time-series, wide-column, key-value, ledger and other targeted data stores all provide specific optimizations for different types of data, and different use cases. Sometimes, these are combined in a single data management platform — a multimodel database; sometimes, they remain as best-fit, targeted point solutions.”

This increase in volume and diversity of information has rendered traditional ways of data management ineffective. Today, a company that selects, manages and optimizes (cleaning and enhancing) each dataset component individually will end up wasting a lot of time — cleaning and transformation alone can take days or weeks — and capital.

The situation is comparable to Yahoo having used human experts to manually evaluate and catalog a deluge of web pages. The company dedicated plenty of resources but could evaluate only a small portion of the internet and struggled to keep the evaluations up to date.

Bringing AI into data management

Just as Google with its automated algorithms took over internet domination from Yahoo, evaluating web pages more quickly and at vastly lower cost, today AI is set to revolutionize the data lifecycle.

According to Ronthal, applications of AI in data management rely on metadata analysis and activation. This allows the model to detect deviations in data usage from system design and (ideally automatically) correct them. This is augmented data management: using AI/ML to automate and optimize data management, allowing organizations to spend less time managing and optimizing infrastructure and more time building core business value.

Many organizations have already started using AI- and ML-driven techniques to touch various components of data management, bringing improvements in speed and cost-efficiency.

For instance, in January 2023, Google and Aible, a company bringing an AI-first approach to the data journey, worked with a Fortune 500 enterprise and enabled it to analyze over 75 datasets with over 100 million rows of data across 150 million variable combinations. The total compute cost: $80, less than a thousandth of the cost of traditional methods.

Aible also published 25 case studies with Intel highlighting how enterprises across geographies and verticals benefitted from AI in less than 30 days and drove value across functions.

Overall, Ronthal notes, AI augmentation can have an impact on multiple disciplines of data management, including:

Priya Krishnan, head of product management for data and AI at IBM, highlighted similar applications.

“AI is being used to ingest, identify and classify datasets from a variety of sources," she said. "It continuously mines content to surface unseen patterns and trends, providing organizations with greater visibility and actionable insights to aid in decision-making. Businesses are using AI to automate otherwise manual tasks like data capture, de-duplication, anomaly detection and data validation. They are also training models to apply regulatory policies and ethical standards automatically, ensuring those principles are embedded from the beginning."

A few roadblocks

While AI can be a handy resource for managing the data lifecycle, not every organization has a dedicated team of expert data scientists who can build models that are responsible, secure and non-biased as well as compliant with regulatory and ethical principles.

This is where companies should try to loop in second-generation tools that could make AI implementation easier for tasks like data preparation, prediction and forecasting.

“You no longer need to be a data engineer or data scientist to do complex data transformations — you can generate them with a large language model (LLM),” Jon Reilly, COO and cofounder of no-code AI company Akkio, which recently debuted a GPT-3-based data preparation tool, told VentureBeat.

As for building trust, Ronthal suggests keeping humans in the loop with a "crawl, walk and run" paradigm.

“Start by [AI] making recommendations that are reviewed by humans. If those are correct and have the desired impact, eventually we will build trust and reduce the required level of supervision. Ultimately, we will reach a point where the AI has been right so many times, that we can allow it the autonomy to automate optimizations with minimal supervision. The stages of maturity can broadly be described as: observing, reporting, recommending, optimizing and predicting. The last three are where augmentation is applied,” he said.

Bringing AI into data management

A few roadblocks

More