Databricks, a startup that provides support for the popular open-source Apache Spark project, will keep pushing the technology for speedily analyzing lots of data and releasing new products based on it, thanks to a new funding round.
The startup announced the fresh $33 million this morning and also revealed a new service that takes the company in a new direction: a platform for running and managing Spark jobs and visualizing data on a Databricks-owned cloud.
On this new cloud, companies can run Spark-based applications for business intelligence and other purposes. The point is to eliminate the need to tinker with a zoo of tools in order to clean up, process, and analyze data, Databricks chief executive Ion Stoica told VentureBeat in an interview.
“Our goal is to say we want to make it as easy as possible for people to use it,” Stoica said of Spark.
Big data people see Spark and its toolset — the Shark SQL query engine, the Spark Streaming tool for processing data on the fly, the MLib library for machine learning, and the GraphX API for graph processing — as the successor of technologies based on MapReduce, the initial programming model for the Hadoop ecosystem of open-source tools for analyzing lots of different kinds of data.
The Spark community claims vast performance improvements on MapReduce, primarily because of its efficient use of computing resources. Programs can go as much as 100 times faster in memory or 10 times faster on disk. So the Hadoop distribution vendors have caught on and started incorporating Spark into their product offerings.
And Databricks partners with those Hadoop distribution vendors, like Cloudera and MapR, to support customers’ use of Spark.
Going forward, though, companies will have the choice of tapping Databricks’ cloud whenever they want, without worrying about managing everything on their own.
The cloud service initially runs on Amazon Web Services, and it will come to other clouds, like Google’s and Microsoft’s, Stoica said.
Users will be able to create dashboards and notebooks to display their findings in addition to running jobs on the service, he said.
In addition to pushing the new cloud service, Databricks will continue to take steps to expand the Spark community, by certifying more applications that can run on Spark and also bringing Spark classes to massively open online course (MOOC) sites, said Stoica, who, in addition to heading up the startup, also teaches computer science at the University of California, Berkeley.
To date Berkeley, Calif.-based Databricks has raised $47 million, including a $14 million round from last September.
New Enterprise Associates led the new round. Andreessen Horowitz also participated.
About 30 people work at the one-year-old startup now. It should enter the 60-100-person range within a year, Stoica said.