Managing and processing data is hard work, especially for businesses with unwieldy databases. San Francisco-based startup Dremio offers tools that help streamline and curate that data, and its efforts haven’t gone unnoticed.
The self-described “data-as-a-service” company today announced an investment from Cisco Investments, bringing its latest round of funding to $30 million and the total amount raised to $45 million. This follows on the heels of two previous funding rounds — one in January led by Norwest Venture Partners, Lightspeed Venture Partners, and Redpoint Partners, and another in late 2015.
“When we make any investment, we gain insight into new markets and offer our customers visibility into best-in-class innovation,” Rob Salvagno, a spokesperson for Cisco Investments, said in an emailed statement. “Dremio’s initial traction has been very promising … [and it] also is applicable to Cisco’s own internal movement towards digital transformation.”
Dremio’s full-stack virtualization toolkit connects, analyzes, and processes data in part with so-called Data Reflections, which accelerate database queries without replicating data. Its platform bridges the gaps between relational databases, Hadoop, NoSQL, ElasticSearch, and other data stores, connecting to business intelligence software as if it were a primary data source and querying it via SQL.
Crucially, it maintains a catalog of data sources, physical and virtual datasets, and datasets’ lineage, making it easy to search and find datasets and see how data are being transformed. Tomer Shiran, cofounder and CEO of Dremio, described it as “Google Docs for datasets.”
“We make data accessible to employees,” Shiran told VentureBeat in a phone interview. “It’s available to anyone who has permission to access it.”
To that end, Dremio contributes to projects like Parquet, Calcite, and Gandiva, an Apache-licensed open source execution kernel for evaluating and compiling expressions on Apache Arrow. And it offers Dremio 2.0, a suite of analytics and data processing tools optimized for enterprise.
Dremio 2.0 consists of the aforementioned Data Reflections technology, which automatically detects schemas, supports cloud data lakes in Amazon S3 and other cloud storage providers, and leverages Apache Arrow to speed up performance by a factor of a thousand. But it also builds in features like automatic failover, which automatically selects new nodes in the event of node and instance cluster failures, and dynamic granular access, which provides programmatic security controls through integration with Kerberos, LDAP, and other centralized providers.
Artificial intelligence is a core pillar of Dremio’s product lineup, Shiran said. The Dremio Learning Engine — which launched alongside Dremio 2.0 earlier this year — uses machine learning to recommend complementary datasets to users, adapt data catalogs automatically in response to changes in schema change and query execution, and intelligently cache and index metadata.
“We see [AI] as an opportunity to make companies’ experience with data as streamlined as it is in our personal lives,” Shiran said.
Dremio’s platform, which runs in the cloud via Kubernetes or in a Hadoop cluster, is available in an open source Community edition as well as a commercial Enterprise edition. (Subscription pricing scales based on the number of nodes to which Dremio is deployed — it can support more than 1,000.) The company, which has more than 70 employees, boasts customers in over 80 countries, including large enterprises like Daimler, Fred Hutch, Idio, Intel, OVH, Royal Caribbean, TransUnion, and UBS.