Big Data

Without stream processing, there’s no big data and no Internet of things

The Internet of things has been on everyone’s lips (and pen) lately, as the biggest new source of money, customer product innovation, investment opportunities, and Sci-Fi-worthy stories (like those of everyone who’s part of the Cyborg Foundation — really, look it up).

The truth is the value of the Internet of things only comes from the astounding mass of data it’s bound to produce, and there will be no money made until the data-processing market is ready to take on the challenge. Many believe that, just as database management needed Oracle and big data needed Hadoop, the Internet of things (or the really big data) needs stream processing.

Stream processing is a technology that allows for the collection, integration, analysis, visualization, and system integration of data, all in real time, as the data is being produced, and without disrupting the activity of existing sources, storage, and enterprise systems.

We ourselves have been talking about the value of stream processing for big data for a while now. Other companies operating in stream processing are Cloudera, which is working with the open-source Apache Spark project; Hortonworks, which supports the open-source Apache Storm project; and cloud provider Amazon Web Services, with its Kinesis service. We are aware of a few other initiatives, but they’re not yet crystallized in full, productive, continuous projects.

No matter which vendor companies choose, though, the Internet of things holds growth projections to make everyone’s head spin: Gartner, for example, predicts that the Internet of things and personal computing will unearth more than $1.9 trillion in revenue before 2020; Cisco thinks there will be upwards of 50 billion connected devices by the same date; IDC estimates technology and services revenue will grow worldwide to $7.3 trillion by 2017 (up from $4.8 trillion in 2012). That sounds fantastic for all the consumer, manufacturing, and government industries — but what does it all mean, and why can’t we make it happen sooner?

The answer is pretty simple: the data processing market ain’t there yet. To be clear, all of these connected devices will generate unimaginable amounts of data, and all of this data will have to end up passing through data-processing entities. Any Internet of things initiative will require a tailored data-processing strategy that balances current requirements, continuous growth, and future applications — and that can only be done through very powerful processors that enable collaboration between devices, devices, analytics platforms, customers, and real-world systems. In other words, let everything speak to everything in the same language, and compile and analyze all conversations in real time.

Let’s talk examples. For one, home automation. At the level of an individual home, this isn’t a big data problem; it’s about usability of the end-user applications. (Does your fridge really order milk for you, in response to your diet app?) But once many millions of homes are connected to the same service, there is a real opportunity for monetization — consumer behavior, appliance behavior, real-time/customized ad placement, and so on — and a scaling problem for the data processing entity.

Also, smart cities: parking and traffic apps already exist today; not an especially big big data problem. But connecting bus, train, road information that’s accurate in real-time; adding video and weather data; connecting info from yesterday, last week, last month; cross-referencing with real-time demographics and offering this information through different channels accessible for both business and consumers (the same ones who keep smart homes) — that gets us closer to the sheer complexity of the Internet of things/big data problem.

If on top of it all we add the layer of data security, we’re now talking really big data. Today’s control systems are not set up for wider access, SCADA has limitations, and wireless access requires SIM cards. But the greater the intelligence, the more sophisticated the attacks. Alerts require usage and pattern analysis for all data, across systems, in real time.

Only stream processing can handle this job, and here’s why:

  • Systems of systems require continuous collection, filtering and aggregation of data. That means all the data is ingested and “translated” as it’s being produced; no data is waiting at the door for its turn to enter the system. Nothing gets lost, overseen or outdated, because the variety of data is not an issue.
  • Analytics happens through incremental computation (pretty cool). That means the system remembers the query, and every time the data changes, the answer changes based on the delta (and not the total amount of data, which saves precious seconds). It allows for staggering volumes of data processed in very little time — in fact, millions of events per second per server core.
  • The results of the analytics are translated and fed back into local systems in real time, which means the distance between data coming in and data coming out can be as low as a few milliseconds.

As with every technology, stream processing, too, will some day need rethinking and updates. Until then, however, the evolution of the Internet of things, and its ability to make money, will remain tightly related to the adoption rates of stream-processing technologies.

Dana Sandu is market evangelista at SQLstream.


VentureBeat is studying mobile marketing automation. Chime in, and we’ll share the data.
1 comments
Stewart Hanna
Stewart Hanna

Dana, absolutely agree with your post. But don't forgot that IBM InfoSphere Streams has been in Government and Commercial Production Deployment for 9+ years across 100's of clients. Spark, Storm, Kinesis and others like them are only just waking up to the need for speed.