3 questions you should ask to get the most out of edge data

For years, enterprises have touted the benefits of a data-first approach — where every major business decision is informed by data insights. With cloud adoption and greater accessibility of artificial intelligence (AI) and machine learning (ML), more data teams have started to live out this ideal. But there’s a curveball being thrown at data leaders, and it’s coming from the edge.

An essential building block of a data-first strategy is access: You need easy access to critical data streams to analyze them and put them to use. But when it’s predicted that 75% of enterprise -generated data will be created and processed outside of the cloud or centralized data centers, data leaders are in a predicament. Their fastest growing data source is far, far away from centralized analytics environments, rendering it effectively useless.

This challenge is growing: The combination of 5G connectivity and the rapid adoption of IoT devices in industries like manufacturing, automotive, logistics and energy is supercharging the edge technology market, which is expected to hit around $116.5 billion by 2030.

So how will data-first strategies evolve when your largest data source becomes your most distributed?

We’re in the early stages of seeing this play out, and I expect that a significant amount of innovation in the coming years will help alleviate this challenge. But for now, there are three essential questions data leaders can consider as they look to apply a data-first mindset to the edge.

After real-time processing, what's next?

Vendors like EdgeConneX and ClearBlade, as well as AWS and Azure Stack Edge, have made it easy for enterprises to derive value from edge data in real time. In manufacturing, edge processing enables predictive maintenance for equipment; in healthcare it allows patients to monitor health from home; and in the automotive industry it makes self-driving cars a reality. Computing outside the centralized data center has been, and will continue to be, game-changing for so many industries.

But data leaders hard-wired with a data-first mindset are naturally starting to wonder: Does the value of edge data stop at the edge?

After real-time processing, data often ends up sitting in edge data stores, collecting dust (and storage costs). This growing pool of critical user data is being left out of the AI applications running in platforms like Snowflake or Databricks — the driving forces behind next-generation customer experiences and strategic business decisions. As this data piles up, more and more data leaders are starting to explore where the long-term value of this data source lies.

Which brings us to question No. two.

Are edge data centers always the most cost-efficient?

To date, edge data centers have proven to be a cost-efficient home for IoT data. But as IoT devices proliferate, the edge data cost analysis is starting to skew. When a single enterprise generates as much as 60 petabytes of edge data every two weeks, storage costs add up. For some, this volume of edge data is translating to multiple millions of dollars a year, which will only go up over time.

Sure, if you’re not paying to store data at the edge, you’ll be paying to store it in the cloud. But the difference is ROI. While data at the edge adds value in the moment, it does nothing over time. If it were in the cloud, on the other hand, it could start informing new product lines or spark strategic partnerships.

So before edge storage fees get unruly, many enterprise data teams are assessing what do do with their edge data: continue to store it at the edge to be analyzed locally; delete it to save money and/or mitigate privacy concerns; or move it to a centralized data center or cloud environment.

Can my data architecture withstand exponential growth?

If you decide that hanging onto your edge data makes sense, you’ll have to think strategically about what that means for your data architecture. In a world where edge data is king, modern data architectures will need to thrive with data that grows exponentially.

This could mean you need to scale up or scale out edge data storage. It could also mean building a data pipeline that accommodates continuous data movement.

Data migrations have traditionally been viewed as one-and-done processes. But the edge is forcing everyone to rethink this assumption. Migration now has to happen continuously, and originate from highly distributed environments. You’re no longer just pulling data, once, from an on-premises data center and dropping it in AWS, Azure or GCP.

To accommodate this shift, some companies are streaming small amounts of data to the cloud, slowly but surely centralizing subsets of business-critical edge data. Alternatively, extremely data-heavy enterprises are looking to automate edge migrations at scale. Whatever route you take, factoring in the reality that your data is growing exponentially is essential to maximizing its value over time.

Evolving the data-first mindset

As edge data becomes your fastest-growing data source, your data-first strategies have to evolve. There is no one right answer for how enterprises should make use of their edge data, but assessing its long-term potential and building the right processes to accommodate (and benefit from) its scale are helpful places to start.

Once these essential questions are answered at the edge, data-first strategies could yield even bigger and more transformative results.

David Richards is CEO of WANdisco.

Welcome to the VentureBeat community!

Our guest posting program is where technical experts share insights and provide neutral, non-vested deep dives on AI, data infrastructure, cybersecurity and other cutting-edge technologies shaping the future of enterprise.

Read more from our guest post program — and check out our guidelines if you’re interested in contributing an article of your own!

After real-time processing, what's next?

Are edge data centers always the most cost-efficient?

Can my data architecture withstand exponential growth?

Evolving the data-first mindset

More