Before your company can profit from big data, you need to assemble the data in the first place.
This often turns out to be a big technical challenge, since the data comes from many different sources and is in different formats.
That’s one reason that Hadoop has become so popular. It’s a series of open-source tools for storing, querying, and finding patterns in huge amounts of different kinds of data. While legacy technology companies like SAP, IBM, and Oracle have been bringing in big revenues for software that takes care of this “data integration” process, Hadoop’s open-source nature promises lower costs and more flexibility.
Riding on the Hadoop wave has produced major growth for one data-integration company, Talend.
“I think that market [for Hadoop] is moving faster than any other IT market I’ve personally witnessed,” Talend’s chief executive Mike Tuchen said. “It’s moving faster than the virtualization wave.”
Talend doubled sales for its two commercial software packages supporting Hadoop — Talend Enterprise Big Data and the Talend Platform for Big Data — from the third quarter of last year to the fourth quarter, Tuchen said during a visit to VentureBeat’s headquarters recently. And before that, sales rose by 150 percent from the second quarter to the third.
Big data is “driving a ton of our business right now, and so we’re going to hire much more and just double down our efforts in that area,” said Tuchen, who was once general manager of marketing at Microsoft’s SQL Server business division.
Los Altos, Calif.-based Talend has already racked up 4,000 customers over its seven years of existence, and it recently raised a fresh $40 million in funding. Now, focusing on Hadoop is helping the company pick up speed as it heads toward a public offering.
Meanwhile, other companies have also been making progress by playing the Hadoop card. Investors have recently backed startups like Datameer and Alpine Data Labs that have built analytics applications for data in Hadoop.
Revolutions in database land
Hadoop is one of two technologies that have shaken up the database world in the past decade, Tuchen said. First came a new generation of data warehouses for storing lots of data in preparation for analytics. In the past three years, legacy vendors EMC, IBM, Hewlett-Packard, and Teradata have brought in or sweetened their data warehousing portfolios through acquisitions (Greenplum, Netezza, Vertica, and Aster Data, respectively).
The value proposition of these newer data warehouses: They reduced by a factor of 10 the cost of processing data using the incumbents’ longstanding appliances, Tuchen said.
Now Hadoop is taking hold, reducing the cost of processing large data sets by another factor of 10, he said.
The fact that Hadoop is free helps. Plus, Hadoop’s file system for storing and processing data can grow and grow, simply by adding servers. Hadoop can accept huge amounts of data, and different kinds of it, so companies can start doing data analysis focused on all the information they have on hand, regardless of what format it has.
“I think Hadoop is an enormous, enormous economic force, just given the cost of what’s going on here, and Talend has a unique play relative to Hadoop,” Tuchen said.
Standing out with Hadoop
By “unique play,” Tuchen was referring to the way Talend performs data integration inside of Hadoop. Other companies do that integration in a pipeline before it arrives inside Hadoop.
“Hadoop is really good at running things in parallel, across tens or hundreds or thousands of machines,” Tuchen said. “If you just drop it in there and then just transform it in place, you can now run that across the entire Hadoop cluster in parallel across tens or hundreds or thousands of machines at full speed. … It’s a far simpler-to-deploy solution.”
It’s a point-and-click process, Tuchen said. Talend “copies it [data] in and rearranges it so that it’s basically transformed into technically the schema that you’re looking for,” he said. “It’s in the format that you can use to analyze.”
Meanwhile Talend appears to have a leg up on handling data integration from a pricing perspective. Rather than charging per server, as top competitor Informatica does, Talend charges on an annual subscription basis, so even as data projects grow — and it’s a fair assumption that they will — the price stays consistent.
And these days, price matters quite a bit.
“I think Talend very often wins because of price,” Ted Friedman, a Gartner analyst who covers data integration, said in an interview with VentureBeat. It has “good enough ETL [extract, transfer, and load] capabilities at a very attractive price point. … Hadoop clusters can involve a large number of CPUs. It starts to blow Informatica’s pricing model out of the water.”
But even if Talend boasts of big growth in Hadoop, Informatica is a bigger data-integration company than Talend, Friedman said. Friedman wouldn’t be surprised if Informatica changes its business model for Hadoop so it can keep pace with Talend’s big data growth. Especially because Informatica chief executive Sohaib Abbasi sees big data as “a major opportunity for Informatica,” as he said in an interview with CRN last month.
In addition to Informatica, Talend competes with plenty of other vendors’ data-integration software, including Oracle and SAP.
Talend continues to perform data integration for data warehouses. But that’s the old guard, technology-wise. Hadoop is what’s hot. “That business is exploding right now,” said Tuchen, who joined Talend in October.
Talend brings in more than $50 million in annual revenue now, Tuchen said.
“We will double the company in the next couple of years and grow from there,” he said. “It won’t be too long before this is a couple-hundred million-dollar company.”
And an initial public offering (IPO) could be on the horizon for Talend.
“I’d say you can’t predict the future, because the public markets are not always as accepting as they are now, but I’d say, you know, sometime in the next couple of years,” Tuchen said.
And if Hadoop ends up playing a big part in a successful Talend IPO, that would not only validate Talend’s approach, it would also underline the advantages of Hadoop over more traditional technologies.