John Funge is cofounder and CEO of streaming data analytics platform provider BrightContext.
When people think of big data, often it’s in terms of Hadoop clusters with gargantuan piles of data.
Yet the human brain is still lightyears beyond these systems in its ability to constantly monitor multiple input streams simultaneously, synthesize them into meaningful and unified models of the world, and take immediate action based on this sophisticated analysis. While the Internet in its current incarnation is extremely well suited to answering queries by performing lookups on raw facts, feats of total consciousness that we perform with a yawn on our way to work escape even the most advanced supercomputers.
Consider, for example, something many of us do so frequently it seems commonplace: commuting.
Before our morning coffee has fully kicked in, we have successfully navigated our car through miles of traffic at high speed. Only half-consciously, we’ve solved the spatial-relational puzzles of merging on and off freeways, we’ve parsed the non-stop deluge of critical symbols, colors and sounds that comprise our traffic signals, and we’ve non-verbally negotiated our positions with everyone else on the road.
And though we are continuously holding the totality of these input streams in mind, making split second decisions that seamlessly blend instinct and experience, we still have sufficient mental bandwidth to catch up on NPR or sneak in an early conference call.
Computers may have us licked when it comes to sorting through millions of database entries per minute without becoming fatigued, but the true distinction of the human mind is its ability to constantly ingest chaotic floods of disparate data: sounds, lights, verbal symbols, ever changing spatial relationships, memories and projections, and to instantly digest this madness into a graceful response.
Science’s best attempts to artificially achieve even a fraction of the human brain’s capability bears this out: The world’s fourth most powerful supercomputer, Japan’s K computer, harnessing the combined power of 82,000 processors, was only able to simulate a single second of human brain activity.
Such is the gap that the “Big Data” of the future will need to bridge. Today, most people talking about big data are referring to the perpetually incomplete process of periodically chunking through large sets of static data.
But as our expectations for technology continue to grow, the tasks facing our systems are becoming ever more complex and multi-faceted. We expect our systems to operate more like the human brain. They should be aware of multiple streaming inputs that shift constantly, whether they are clickstream analytics, market trends, social media, or troop movements. They should be able to ingest information from all of these sources at once.
To achieve a true integration of data ingestion, synthesis and timely action when dealing with such volumes of information, the systems of tomorrow must natively interpret distributed high speed data streams as seamlessly as the human brain blends the five senses together to get a complete picture of what is actually happening in real-time.
The underpinning technologies that will support these systems are already beginning to emerge.
Advanced sensor networks capable of detecting the most minute stimuli will serve as the eyes, ears and nose of tomorrow’s smart data systems, while stream processing platforms, in-memory and NoSQL databases, and massively parallel architectures will serve as the gray matter.
To see the big data “cerebral cortex” developing, we can look to Internet powerhouses like Twitter, Facebook and Yahoo who are investing time and resources to develop real-time processing and communication systems. Facebook’s Wormhole keeps its many collaborating and distributed systems apprised of data changes to ensure operational uniformity. Wormhole is quite literally acting as a real-time nervous system for Facebook. Yahoo’s Storm-YARN project is concerned with bridging the gap between big data at rest and real-time updates as they come into the system. Twitter and LinkedIn each have open sourced sophisticated stream data processing frameworks.
These efforts and others reveal a shift in focus from batch processing to real-time processing and communication amongst top-tier tech companies, and there are strong indications that the broader market is poised to follow them.
We’re still in the very early days of big data.
We’ll look back on today’s batch-oriented point-of-view as quaint. Tomorrow, our smartest systems will instantly and masterfully oversee, optimize, manage and adapt processes that are as critical as they are complex, across a broad spectrum of industries, having taken their cues from the most ready example of a natural supercomputer we have: the human brain.
John Funge is cofounder and CEO of streaming data analytics platform provider BrightContext. He previously cofounded and built two successful technology firms, both of which were acquired by larger public companies: Clara Vista acquired by CMGI (1999) and Pickle.com acquired by Scripps Networks (2007). He also advises and invests in early-stage Internet and digital media ventures.
VB's research team is studying web-personalization... Chime in here, and we’ll share the results.