Every tweet ever written is now available to search and analyze, thanks to Gnip

This morning, Gnip launched its Historical PowerTrack for Twitter, which will give developers the ability to search, find, analyze, and compare all the tweets ever written, even ones written before the developer in question started scraping Twitter.

It's the same level of access the Library of Congress got when it started archiving and storing all Twitter data, but this time, it's commercially available.

"There are a handful of companies that have collected some portion of Twitter data," said Gnip COO Chris Moody in a meeting with VentureBeat yesterday. "We were able to do it because we partnered with Twitter on it."

The data will make it possible for anyone (anyone working with Gnip, that is) to parse a huge historical archive of tweets and look for patterns. Did tweets have a real correlation to the 2008 election results? How about the iPhone 3GS launch and first-week sales? Using that kind of information, analysts can better forecast expected results for current events.

"Four and a half years ago, the company was founded on the idea that data would be insanely valuable, that people would do amazing shit with it, said Moody, "and we wanted to fuel all those applications."

We asked Moody the billion-dollar question: What if there turns out to be no or very little correlation? What if tweets turn out to be just so much hot air, totally useless for predictive analysis?

"We have many risky things about our business, but that's not one of them," he said. "We spend all day every day talking to people who are finding goldmines in this data. ... They're so excited and they're investing so much. Maybe we're deceiving outselves, but if that turned out not to be the case, we'd fold up and go home, because we were founded on the idea that data is valuable."

Still, Gnip isn't doing the analysis itself and doesn't have too much control over where the magic happens. Calling his company's bread-and-butter "just the plumbing" of online data analysis, Moody said, "Some of our customers have a milllion-plus rule that they will use to filter data ... If we miss a single tweet, that could cost a customer a million dollars."

But if Gnip could make its own use cases, the company would probably focus less on sales forecasting and more on global events with huge human impact. It would try to predict where the next political revolution is going to happen or how the election will turn out.

"We are a commercial entity; our services do cost money," Moody said. "But some people have approached us -- a PhD candidate studying epidemic outbreaks, for example -- and unfortunately a lot of that stuff we can't serve today. ... [But] we're always fascinated by those; they always make the most fascinating use cases to talk about.

"If you're Miami, and you know you're going to get a hurricane, you can look at social data and model the kinds of questions and concerns people have, evacuation route planning, you can figure out how people are exiting the city with location data. We had a big fire outside of [Gnip's hometown] Boulder [Colorado] a couple of years ago. ... One company mashed up geocoded Twitter data with geocoded Flickr data and was able to be an extra set of eyes for emergency responders."

But how Gnip's historical Twitter data gets used is up to its customers. Gnip considers itself responsible solely for being a fully reliable, fully compliant steward of that data -- and that means deleting private or deleted tweets each time a customer runs a request.

"It takes a lot of horsepower," said Moody. "In one dimension, you have a lot of data, and then you have to filter it really quickly -- no one actually wants all the data -- and then you have to assume that we'll be innundated with requests. ... So it has to scale in many directions. It's an incredibly difficult task, and it really involves a lot of proprietary technology that we deveoped to take this on."

Gnip does plan to start sharing bits and pieces of its architecture -- think less of pretty but useless diagrams and more of highly useful information that might help others plan their own architecture and products. Moody said the company might even open-source some of its tech someday soon.

When it comes to the data and the tech that powers it, Moody concludes, "We're half a percent into the journey of what is possible. ... The platforms themelves are maturing. The way people use Twitter today is very different from a few years ago. And now, you have this full data set to operate on."

Top image courtesy of Ilse, Flickr

More