SAN FRANCISCO — Ask analytics wonk Tom Davenport what’s changed in the decades since companies started collecting and reporting on data, and he’ll talk about the rise in the number of data sources, the emergence of data scientists, and the need to get more people inside companies analyzing data.
But as Davenport interviewed data scientists (along with DJ Patil, a co-creator of the term “data science”) about what those people actually do for a living, he realized that their jobs weren’t as sexy as some people might imagine.
“People spend a huge amount of time on what they call munging data or extracting, filtering, cleaning data from various kinds of systems,'” Davenport, author of the 2014 book Big Data @ Work: Dispelling the Myths, Uncovering the Opportunities, said at VentureBeat’s DataBeat conference today.
Davenport became convinced that as lots of people inside companies want to analyze more kinds of data, data scientists need to cut down on the amount of time they spend on this dirty work.
Tools like Trifacta and Paxata have emerged in the past few years to speed up the cleaning work, and more recently, Davenport said he’s been influenced by the approach that startup Tamr is taking to data management.
Tamr incorporates machine learning as well as crowdsourcing, said Davenport, who is also the president’s distinguished professor of information technology and management at Babson College. And with support for NoSQL databases and the Hadoop open-source software for managing lots of different kinds of data, Tamr could be a key part of the latest generation of data analytics.
“I don’t think you can do all this without adopting some new approaches to data integration and curation,” Davenport said. “It’s just not going to happen without that.”
As with all our events, we cover every company that appears onstage at DataBeat without regard to sponsorship. You can view a list of all of our sponsors here.
Powered by VBProfiles
VentureBeat’s VB Insight team is studying email marketing tools.
Chime in here, and we’ll share the results