It’s been almost two years since I took a first stab at charting the booming big data ecosystem, and it’s been a period of incredible activity in the space. An updated chart was long overdue, and here it is (click the full-screen option to enlarge):
Editor’s note: VentureBeat has cherry-picked the best of these companies to present at our upcoming DataBeat conference, May 18-19 in San Francisco.
A few thoughts on this revised chart, and the big data market in general, largely from a VC perspective:
Getting crowded: Entrepreneurs have flocked to the space, VCs have poured money into promising startups, and as a result, the market is starting to get crowded. Certain categories like databases (whether NoSQL or NewSQL) or social media analytics feel ripe for consolidation or some sort of shakeout (which may have already started in social analytics with Twitter’s acquisitions of BlueFin and GNIP).
While there will be always room for great new startups, it seems that a lot of the early bets in the broader infrastructure and analytics segments have been made at this stage, and the bar for success is getting higher — which doesn’t mean that VC money will stop pouring in. In terms of this specific industry chart, we’ve clearly reached the limit of how many companies we can fit one page. I’m sure there are a number of great companies we either missed or didn’t have enough space to include — apologies in advance to those, and I’d love to hear people’s thoughts and suggestions in the comments section about who else should be included.
Still early: Overall, we’re still in the early innings of this market. Over the last couple of years, some promising companies failed (for example: Drawn to Scale), a number saw early exits (for example: Precog, Prior Knowledge, Lucky Sort, Rapleaf, Nodeable, Karmasphere), and a handful saw more meaningful outcomes (for example: Infochimps, Causata, Streambase, ParAccel, Aspera, GNIP, BlueFin labs, BlueKai).
Meanwhile, some companies seem to be reaching significant scale and have raised spectacular amounts of money (for example, MongoDB has now raised over $230M, Palantir almost $900M, and Cloudera $1B). But overall, we’re still early in the curve in terms of successful IPOs (Splunk or Tableau notwithstanding) and large exits, although the big companies are getting more acquisitive in the space (Oracle with BlueKai, IBM with Cloudant). In many segments, startups and large companies are jockeying for position and no obvious leader has emerged.
Hype, meet reality: A few years into a period of incredible hype, is big data still a thing? While big data is becoming less press worthy, the next couple of years are going to be hugely important for this market, as corporations start moving projects from experimentation to full production. While those deployments will lead to rapidly increasing revenues for some big data vendors, they will also test whether big data can truly deliver on its promise. Meanwhile, the fundamental need for big data technology keeps increasing as the deluge of data keeps accelerating, powered in part by the rapidly emerging Internet-of-things industry.
Infrastructure: Hadoop seems to have solidified its position as the cornerstone of the entire ecosystem, but there are still a number of competing distributions — this will probably need to evolve. Spark, an open source framework that builds on top of the Hadoop Distributed File System, is getting a lot of buzz right now because it promises to fill in the places where Hadoop has been weak, namely interactive speeds and good programming interfaces (and early signs seem to point to fulfilling that promise). Some themes (for example, in memory or real time) continue to be top of mind; others are appearing (for example, there’s a whole new generation of data transformation/munging/wrangling tools, including Trifacta, Paxata and DataTamer).
Another key discussion is whether enterprise data will truly move to the cloud (public or private), and if so, how quickly. Many will argue that Fortune 500 companies will keep their data (and the software to process it) on premises for years to come; a generation of Hadoop-in-the-cloud startups (Qubole, Mortar, etc.) will argue that all data is moving to the cloud long term.
Analytics: This has been a particularly active segment of the big data ecosystem in terms of startup and VC activity. From spreadsheet-type interfaces to timeline animations and 3D visualizations, startups offer all sorts of different analytical tools and interfaces, and the reality is that different customers will have different types of preferences, so there’s probably room for a number of vendors. Go-to-market strategies differ as well. Some startups focus on selling tools to data scientists, a group that is still small but growing in numbers and budget. Others adopt the opposite approach and sell automated solutions targeting business users, bypassing data scientists altogether.
Applications: As predicted, the action has been slowly but surely moving to the application layer of big data.The chart highlights a number of exciting startups that are fundamentally powered by big data tools and techniques (certainly not an exhaustive list). Some offer horizontal applications — for example, big data powered marketing, CRM tools, or fraud detection solutions. Others use big data in vertical-specific applications. Finance and ad tech were always early leaders in adopting big data, years before it was even called big data. Gradually, the use of big data is spreading to more industries, such as healthcare and biotech (particularly in genomics) or education. This is only the beginning.
[Many thanks to my FirstMark colleague Sutian Dong for doing a lot of the heavy lifting on this chart. My former colleague Shivon Zilis of Bloomberg Beta contributed immensely to prior versions of this chart.]
VB's research team is studying web-personalization... Chime in here, and we’ll share the results.