We're sitting on a big data time bomb

$114 billion. That’s how much global organizations will spend on big data in 2018, an increase of more than 300 percent in just five years. But how much of that is money well spent?

Over the past 10 or so years, we’ve seen widespread adoption of new approaches for managing big data such as Mapreduce and the introduction of schema-less databases for massive-scale storage, as well as complementary technologies like Hadoop, Storm, and Spark for storage and processing. But making use of big data means more than deploying a particular platform or paradigm: At its best, it means a total redesign of how companies structure and organize data.

Despite big data’s promising benefits, few organizations have begun the essential steps to prepare for the adoption of new capabilities and data platforms. An industry survey of global companies found that only 35 percent have “robust processes for data capture, curation, validation, and retention.” Equally troubling, 67 percent “do not have well-defined criteria in place to measure the success of their big data initiatives.” Instead, big data solutions are integrated reactively, department by department, or not at all.

The amount of available data in the world will have exploded to 44 zettabytes by 2020 — 10 times what it was in 2013, according to a 2014 IDC report. Companies that fail to prepare for this next generation of massive data volume and insights run the risk of incurring operational and technical debt. In an example of corporate natural selection at work, those that fall behind are doomed to wither away.

Here’s what they can expect as this big data time bomb goes off.

Catastrophic loss of transparency. Few IT professionals have experience managing big data platforms at scale — a situation that has created a massive skills shortage in the industry. By 2018, U.S. companies will be short 1.5 million managers able to make data-based decisions. A recent McKinsey Quarterly report estimates that, in order to close this gap, companies would need to spend 50 percent of their data and analytics budget on training frontline managers; it also notes that few companies realize this need.

As data needs broaden, managers without a firm understanding of information management and best-practices in data extensibility will encounter major challenges with managing data-driven systems. Through poor operational transparency, businesses will struggle to identify when data is inaccurate and meaningful and even whether key reports and metrics are running properly. Being able to grasp these intricacies and ask the right questions about data will become a mandatory skill. Anything less will mean a lack of visibility into how your business is run, inhibiting informed decision making and diminishing your company's competitive edge.

Skyrocketing personnel costs. In 2014, data scientists spent an estimated 50-80 percent of their working hours on cleaning and processing datasets. In the near-term, companies are often tempted to outsource the automation of data preparation tasks to off or nearshore data specialists. Demand for these services is already fueling an explosion of microwork platforms like CloudFactory, MobileWorks, and Samasource, which are expected to become a $5 billion industry by 2018.

However, the outsourcing approach doesn’t scale. Referring back to the predicted 44 zettabytes of data, this amount of rapid growth would require thousands of offshore and nearshore team resources with a long-term viable solution. Any sustainable solution will need to involve significant automation.

Communications blockage. Companies today interact with each other through curated data, but the effort to facilitate that process pales in comparison to what is coming within the next 20 years. A new standard of corporate data networking will emerge involving organizations of all sizes trading, publishing, and measuring curated datasets as well as the corresponding algorithms and metadata. A company that’s not able to participate in this global data marketplace will be unable to capitalize on the market intelligence on offer.

This evolution to commercial mass data-sharing is already underway in every sector of the global economy. Under pressure to allow third-party verification of their research, pharmaceutical companies such as GlaxoSmithKline recently proposed plans to share clinical trial data more broadly. President Obama has called upon tech companies to share data about potential hacking threats. A recent Forrester report predicts that data services will become “a mainstream aspect of product offerings” in 2015, citing examples from John Deere’s FarmSight to LexisNexis’ analytics products. At this pace and by the next decade, effective use of big data won’t just be key to winning in the marketplace, it will be a prerequisite for participation .

Despite these impending challenges, you can avoid the big data time bomb -- if you take action now. Here are three steps that can defuse this oncoming explosion within your company.

1. Resist a "collect data now, figure out later" approach

To ensure future analytics capabilities, companies must invest now in a platform that enables fast, efficient onboarding of new datasets. They should consider how their business will operate in the future with regard to data ingestion and federation and how the transition from legacy systems to end-to-end automated data and analytics should take place.

Central to this is the ability to invest in a new platform that scales purposefully, carefully, and transparently, as opposed to collecting data without a clear objective or investing effort to interpret the data being collected.

2. Re-architect legacy data applications — no matter how painful it seems

Many companies are overly reliant on outdated legacy systems with high maintenance overheads where the cost to upgrade or make strategic change is de-prioritized. That’s true even of major tech companies — for instance, although Samsung’s SmartHub TV software runs on the cloud, all its financial transactions are still handled on-premises due to the cost of moving them.

The net result is that in many organizations, data is siloed across many divisions. Some data — such as social media stats — are even stored outside the company, creating another layer of complexity. To innovate in big data, companies must revamp legacy data applications with a focus on greater operational transparency across a variety of departments.

3. Take a modular and multi-granular approach to data management

The further you can shape raw and insightful data into modular, well organized entities at varying levels of granularity, the more likely you will be to able to efficiently leverage business insights as well as remain nimble to react to the ever changing big data landscape — this is how you can defuse the big data time bomb.

(Thanks to Poornima Apte and Edward Newell of Hippo Reads for their research help on this story.)

Cameron Sim is CEO of Crewspark.

1. Resist a "collect data now, figure out later" approach

2. Re-architect legacy data applications — no matter how painful it seems

3. Take a modular and multi-granular approach to data management

More