I’ve been researching big data and analytics for well over a decade, and my focus has generally been on the business opportunities provided by this amazing resource. How can big data and analytics transform marketing, finance, human resources, product development, and so forth? How can companies compete more effectively with these tools?

This focus on opportunity is tantalizing, but every so often you have to look under the covers at what makes these stirring accomplishments possible. That’s the situation we are in today with big data.

The difficulty of extracting and integrating data from a variety of big data sources has become an issue that organizations cannot ignore.

It's the dirty little secret of business analytics — that it often takes more energy to extract, clean, and integrate the data (what a recent New York Times article called the “janitor work, ”a fine analogy, but I prefer “plumbing” overall) than to analyze it.

Look, it wasn’t that easy to pull together the necessary data environment even for small data. But with big data -- which involves data from multiple sources, each of which has problematic attributes  -- that problem has become the elephant in the room.

In fact, I think that all this data preparation plumbing activity is a leading cause of the lack of sophisticated analytics on big data, the “big data = small math” phenomenon that I and others have observed. There often isn’t enough time or energy left to analyze the data with advanced analytics after doing all the janitorial work.

At times like this, organizations need to devote energy not only to big data opportunity, but also to improving the plumbing. Our data pipes are clogged, and they’re preventing the water of business opportunity from flowing. Data scientists could be helping to cure cancer, or at least traffic jams, and they are spending all of their time removing outliers, matching up keys, and dealing with missing data.

The good news is that there are an increasing number of new tools to speed up the productivity and effectiveness of data plumbing. Powerful methods such as machine learning that speed up analytics can also speed up data preparation and curation. Like any plumber, you wouldn’t want to bring a single tool to the jobsite. Instead a data scientist should have access to a variety of tools and be prepared to use the right one for the task at hand.

I’ll be discussing both the need for solutions and some of the available tools in a VentureBeat webinar on Oct. 1, 12:30 p.m. EST. The webinar is sponsored by Tamr, a big data “plumbers helper” that I advise. I hope you can join me to discuss this important topic for the future of big data.

We have nothing to lose but our clogs.


Tom Davenport has taught business management as the President’s Distinguished Professor in Management and Information Technology at Babson College, as well as Harvard Business School, the University of Chicago, Dartmouth’s Tuck School of Business, and the University of Texas at Austin. He has also directed research centers at Accenture, McKinsey & Company, Ernst & Young, and CSC. His latest book, Big Data at Work, was released earlier this year.