Data scientists shouldn’t have to spend the majority of their time doing grunt work: getting data ready for the analysis that could really make a difference for their employers. They should be able to come up with cool new ideas worth evaluating. No wonder investors are jumping all over software that automates parts of the data-cleaning workflow.
Today, one of the hottest startups in that market, Trifacta, announced a $25 million round of funding, following the $12 million round it revealed just six months ago.
Trifacta wants to grow and get its software running at many more businesses, so taking up investors on a great deal made sense, cofounder and chief executive Joe Hellerstein said in an interview with VentureBeat.
“It was an attractive offer,” Hellerstein said. “Walking away just because we could probably makes less sense than saying yes just because we could.”
The funding news arrives a week after Tamr, a startup that combines automation and human intelligence to accelerate the process of integrating different kinds of data, launched with more than $16 million in venture backing.
Rather than focus on data warehouses that many big companies use to hold a wide variety of data, Trifacta sits on top of the Hadoop open-source software for storing and analyzing lots of different kinds of data. It provides a visual interface for showing previews of what data-transformation scripts will look like when performed in Hadoop, before the data is actually processed. Trifacta remembers user preferences based on previous interactions and abstracts away the difficult task of writing scripts.
Above: Trifacta’s user interface.
Image Credit: Screen shot
It’s the sort of software that many people inside a company can use if they want to analyze data sitting in Hadoop that ordinarily would be too complex to work with.
Rather than compete with legacy vendors’ software for extracting, transforming, and loading software — or even Tamr — Hellerstein believes Trifacta goes up against the practice of maintaining teams of people to clean up data. Still startup Paxata does show up on the competitive landscape, Hellerstein said.
Trifacta’s customers include Lockheed Martin and Accretive Health.
San Francisco-based Trifacta started in 2012 and now has 35 employees. To date, the startup has taken on $41.3 million in funding. Ignition Partners led the new round. Accel Partners and Greylock Partners also participated. Intel-backed Hadoop distribution vendor Cloudera announced a partnership with Trifacta in March.
Trifacta’s software does without dazzling visualizations and instead opts for basic histograms that can give people a sense of what’s inside a column of data, and what would happen if a person were to perform a certain kind of tweak of that column. That might surprise people who expect a lot of data visualization fancy from a startup with Jeff Heer as a cofounder.
Trifacta has been “trying to make sure visualizations we had were well suited to the task,” Hellerstein said. In the future, though, more visually stimulating visualizations could be on the way.
“I think as the product evolves and as we are taking on more functions, I think the visualizations will evolve, most certainly, alongside them,” Hellerstein said.
Trifacta could also expand its focus beyond Hadoop and add support for other platforms that handle lots of data. Maybe it will support data transformations inside data warehouses. Or it could integrate with the open-source stream-processing software Storm, as Hellerstein alluded to during VentureBeat’s DataBeat conference last week.
“It’s really a business question, as opposed to a technology question,” Hellerstein said.