To further strengthen our commitment to providing industry-leading coverage of data technology, VentureBeat is excited to welcome Andrew Brust and Tony Baer as regular contributors. Watch for their articles in the Data Pipeline.
Anyone who’s anyone knows that data is one of the world’s greatest resources, but businesses face significant challenges as they seek to unlock the benefits of data that’s spread across myriad systems.
The second annual The State of Data and What’s Next report, which includes input from some 400 companies from across the geographic and industrial spectrum, points to the growing shift from a centralized to decentralized data infrastructure, with companies now averaging four to six distinct data platforms — and some as many as 12. This is roughly in line with last year’s data, which found that 52% of respondents had five or more different data platforms in their ecosystem.
There are many reasons why the number of disparate data platforms within a company are growing — for starters, it’s now easier than ever to spin up a new data store, thanks to the proliferation of the cloud.
“Since 2016, cloud technologies have made it very easy and reasonably cheap to spin up new data stores,” Starburst VP of data mesh Adrian Estala told VentureBeat. “‘Storage is cheap’ was a common phrase — you could literally spin up a new environment in days, with a credit card.”
On top of that, there is simply more data than what companies know what to do with, which has inevitably led to a gargantuan data sprawl.
“From IoT to sensors and mobile devices, we suddenly had more data than we ever imagined,” Estala continued. “Forbes had a famous quote that 90% of the data had been created in the last two years, [but] we probably created that much in a month this year. If data is the new oil, then what took 50 million years to create (oil), now takes a month (data).”
For context, a “data platform” could be anything from an analytics system or data lake, to a data warehouse or object storage. The more such platforms a company has in its IT set up, the more complexities there are in terms of unlocking big data insights. This is particularly true for so-called “streaming” data, which is concerned with harnessing data in real time — this can be useful if a company wants to generate insights into sales as they’re happening, for example.
When asked what types of new data they planned to collect in the next year, 65% of respondents cited streaming data as their top priority, which was followed by video and event data, which were tied on 60%.
Elsewhere, the report found that around half of the companies surveyed take more than 24 hours to create a new data pipeline to move and transform data between locations, and then a further 24 hours (at least) to operationalize the pipeline and deploy it into a production setting.
This was identified as one of the major problems that companies face as they strive for real-time business insights, and is partly why the industry is moving away from the pipeline process toward a decentralized model — or a “data mesh,” as it’s often called today. This data mesh basically makes data available to anyone, anywhere in a company, with a focus on speed — being able to access the data at its source, rather than having to transport and centralize it.
The report showed that while the rate of change varies by region, companies by and large are planning a more decentralized data architecture strategy in the coming months.
And this, according to Estala, was one of the biggest single surprises they saw in this year’s report — the speed at which organizations have pursued decentralization.
“The shift to a decentralized model happened very, very fast,” Estala said. “Just a year ago, we were having difficult arguments on the best way forward — the big cloud providers that many organizations were ‘hitched’ to were adamant that centralization was the only way. This shift [to decentralized] is business-driven, not IT-driven. This demonstrates the urgency to deliver digital transformation. IT has realized that we can’t migrate to — or sustain — a centralized architecture with the efficiency that the business demands.”
Fast and efficient
Ultimately, companies are starting to prioritize faster data access, and this is partly in response to the pandemic-driven challenges of the past couple of years. The report noted that supporting customer engagement was the most common driving force behind their push toward real-time data and analytics (33%), which was followed by a desire to stay ahead of risk and marketing swings (29%) and employee engagement (29%).
Other notable trends to emerge from the report include the great migration toward the cloud, with respondents noting that 59% of their data is now stored in the cloud vs. 41% on-premises, up from the 56% vs. 44% that emerged in last year’s report.
Aside from highlighting the growing prominence of cloud computing, this also serves as a timely reminder that multi-cloud and hybrid models remain a popular alternative for companies that are unwilling or unable to make the full transition. Indeed, “multi-cloud flexibility” was cited as the top (43%) influencing factor in respondents’ buying decisions regarding cloud data storage, with “hybrid interoperability” jumping from 26% to 34% on last year’s report.
“Multi-cloud was not where we thought we would end up when we were designing cloud strategies seven years ago, but it is now a reality,” Estala said. “This, more than anything else, underscores why a decentralized approach like data mesh is the only way forward.”
The 2022 State of Data and What’s Next report is available to download now.
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.