Big data has been the subject of a lot of hype lately. But that doesn’t mean it’s a fad; this movement is certainly here to stay. It is essentially the business of how we deal with the volume, variety, and velocity of data that is continuously accumulating around us. The idea is that we can learn quite a bit more about ourselves (both as consumers and enterprises) by sifting through all of the data that surrounds us. We’re now entering the formative years of Big Data being the fifth technology wave (mainframes, PCs, Internet, and social media being the previous four). And just as with every wave before it, your business is going to have to figure out the fundamental opportunities and challenges before you can make Big Data work for you:
Hadoop – Hadoop is the de-facto open source distribution of tools for supporting Big Data. It has become the driving force of how we ingest, manipulate, and consume Big Data differently. Hadoop tools represent a cheaper alternative for crunching massive quantities of information. While there are plenty of Hadoop choices (e.g. MapR and Hortonworks); CloudEra is belle of the Hadoop ball. The biggest challenge still remains its overall accessibility to mere mortals; it is very clear that MapReduce programming is still not for the feint of heart.
The Skills Gaps – Powerful tools like Hadoop (and emerging NoSQL databases) are the playthings of data scientists. Unfortunately, there just aren’t enough high quality data scientists to fill all the open positions for the next decade. The scarce yet valuable skills that a data scientist brings includes:
- Ingesting data in all shapes and sizes
- Imputing data when there is an incomplete picture
- Joining data where there are few obvious relationships
- Analyzing data through statistical and algorithmic techniques
- Conveying data to offer better insights to human and/or machine decision-making
- Visualizing data so that there’s a story that others may understand.
The challenge is that these skillsets will require re-training and re-tooling today’s information workers. The Big Data wave will succeed (or not) by how democratized these skills become. Incidentally, I affectionately refer to the non-PhDs that are acquiring these skills as data artisans.
Tools vs. Solutions – To date, the companies that deliver tools for data scientists have rightfully claimed primacy in the Big Data space. This is to be expected considering how early we are in this next wave. At the same time, it’s reassuring to hear folks like CloudEra VP of product Charles Zedlewski affirm that solutions need to be built above the Big Data infrastructure layer. A variety of companies are focused on next-gen Big Data solutions, including Datameer, Karmasphere, Platfora, and Alteryx (my own company).
I had a chance to attend the Accel Big Data Conference a few weeks back. It was impressive to see 500+ folks in attendance, all packed into the Stanford Alumni Center to hear what we were all up to in this burgeoning community. Accel and Ping Li (@ping_accel) in particular, deserve major kudos for putting together such a galvanizing event. Ultimately, it is the responsibility of this emerging community to humanize Big Data: to make it more accessible, to make it more usable, to make it more experiential. You can bet this community will bring Big Data into a majority of businesses over the coming years.