You may not realize it, but Twitter’s raw firehose of data is an unstructured mess of tweets, retweets, hashtags, snark confused as wit, follow activity, and much more. Fortunately, social data processing startup DataSift has complete access to Twitter’s pipe as well as a fresh $42 million in funding to sort all that data into something more beautiful and efficient.
Today DataSift rolling out a system for implementing custom parameters to look for in tweets. (Think of it sort of like highly advanced Gmail message filters for tweets, except these filters can be trained to understand and identify every piece of unique data within a tweet.) The technology, called VEDO, takes care of the heavy lifting of converting data from an unstructured format to a structured one. From there, business analysts can create visualizations or align the cleaned-up tweets with internal company data in order to get new insights.
The idea is to help customers more quickly apply machine learning across large amounts of unstructured data, said DataSift founder and CTO Nick Halstead in an interview with VentureBeat.
And don’t think the new capabilities aren’t new technology for new technology’s sake. Customers have been demanding them, Halstead added.
To help companies get started, DataSift is offering up a library of example classifiers that can help make of the tweets. For example, in a demonstration Halstead showed how a classifier draws on a set of car models to figure out which manufacturers tweets are referring to.
“A lot of enterprises definitely don’t have this knowledge,” he said. “They can leverage our library of pre-built classifiers.”
Indeed, the company has existing enterprise customers to serve, including CBS Interactive, Dell, and Yum Brands.
DataSift has come a long way since it started working with Twitter to sell access to the firehose of tweets in 2011. It’s become a known-by-name vendor of data that companies want to analyze to understand what customers, competitors and commentators are thinking. But the company isn’t satisfied with that status.
DataSift is now looking beyond just the social networking data of which it’s been a key provider.
“You can imagine that this platform is agnostic, and we’ll be opening up very soon the ability to send any of your unstructured data thru DataSift,” Halstead said. In other words, DataSift intends to become a service provider for interpreting all kinds of messy data, not just a vendor of data with intelligence layered on top of it.
The user interface isn’t fancy. Data gets served up in a command line in response to users’ queries, and then it can be pushed into databases. Halstead said the company is working on making the tool accessible in a more visual format. That upgrade should be a bigger part of the conversation as the company shifts toward accepting more data inputs.
The technology boils down to a new programming language, Halstead said. It’s “an important starting block” for processing and understanding data, he said. At the same time, the bigger picture shows a company starting off down a whole new direction.