Andy Jassy, senior vice president of Amazon Web Services, wasn’t kidding around last week when he hinted at Amazon’s plans to add big-data services to its already vast lineup. Today the company announced Kinesis, a new tool to accept real-time data, replicate it, and deliver it to applications running on Amazon.
With Kinesis, developers can get more creative about what to do with large amounts of data flowing in live. Developers building applications on Amazon’s cloud can now more easily take advantage of sensors collecting data, which is an important step as the Internet of Things picks up speed and consumers try out devices connected to the Internet.
“The Hadoop cluster world is great for analytics, or for actually processing large amounts of data, but it’s definitely not suitable for doing real-time operations,” said Werner Vogels, Amazon.com’s chief technology officer, building up to his announcement of Kinesis at Amazon Web Services’ re:Invent conference in Las Vegas. “We need to make it easier for anyone to do real-time operations.”
As data streams in, Kinesis replicates it to three availability zones, or facilities that are separate from each other but close enough to provide low latency among each other.
If suddenly a torrent of data comes in, Kinesis can scale up. If there’s a quiet period, Kinesis can automatically scale down. Data can include what users are clicking on, what sensors are picking up, what people are saying on social networks — whatever. The point is, Amazon is making it easier to make applications change in response to hot, fresh data. Or, if the data is not intended for an application anyone in the world can consume, internal analysis of it can now be more up-to-date.
Integration with existing Amazon data processing services is critical to the adoption of something like Kinesis, and indeed Amazon showed in an on-stage demo today how Kinesis can take live tweets and push them right into the DynamoDB NoSQL database to see which words are the most popular right now.
Another demo showed Kinesis sending two days’ worth of tweets into Amazon’s data warehouse, Redshift, which keeps historical analytics data. Analysis in Redshift helped answer questions not only about which planets are popular in tweets but also about why. People tweeted about Mars but also words such as “rock,” “performance,” and “concert” were commonly mentioned alongside “Mars.” It became obvious that Bruno Mars fans were the culprit for all of this excitement about Mars.
Finally, a demo showed where all these tweets about Bruno Mars were coming from, with a visualization courtesy of data from Kinesis and queries run on it with Amazon’s RDS relational database and the newly announced support for PostgreSQL querying engine.
The magic of Kinesis lies in its compatibility with popular Amazon services and its apparent simplicity; the application with all of these features took less than a week for two engineers to build.
The technical architecture behind Kinesis involves Amazon jargon such as streams, shards, data records, data blobs, and so on. Interested parties can read about those things — and the pricing model — in an Amazon blog post. And developers can now sign up for early access to a limited preview of the service.
Bottom line: For companies that want to start putting data from sensors to work, Amazon just became more compelling.
Editor’s note: Our upcoming DataBeat/Data Science Summit, Dec. 4-Dec. 5 in Redwood City, will focus on the most compelling opportunities for businesses in the area of big data analytics and data science. Register today!