(AOL has responded, saying they screwed up, and have taken the data down. More at update here).

AOL%20Research.jpg

Here are some excerpts from a post from Adam D’Angelo, over at CalTech, about AOL Research’s efforts to engage with the research community. Does anyone else think they’ve gone over the line with this?

AOL just released the logs of all searches done by 500,000 of their users over the course of three months earlier this year. That means that if you happened to be randomly chosen as one of these users, everything you searched for from March to May (2006) is now public information on the internet.

…The data is “anonymized”, which to AOL means that each screenname was replaced with a unique number. “It is still a research question how much information needs to be anonymized to protect users,” says Abdur from AOL. Here are some examples of what you can find in the data:

Among user 545605’s searches are “shore hills park mays landing nj”, “frank william sindoni md”, “ceramic ashtrays”, “transfer money to china”, and “capital gains on sale of house”….I’m leaving out the worst of it - searches for names of specific people, addresses, telephone numbers, illegal drugs, and more. There is no question that law enforcement, employers, or friends could figure out who some of these people are….I hope others can find more examples in the data, which is up for download over here (scroll down to the 500Kusers.tgz file).

If you go to the site, there’s a person even thanking AOL for this info in comments. We haven’t looked at this very closely yet, and haven’t talked with AOL. But so far, we’re cringing.

Trackback URL

7 Trackbacks

  1. UMBC eBiquity said:

    Does AOL’s search data compromise privacy?

  2. August 6th, 2006
    10:38 pm

    A Day in the Life of an Information Security Investigator said:

    AOL Blows It: Releases Search Data on 500,000 Users!

    AOL, what the %#$@$@ were you thinking? You provide a 440MB file of search queries from 500,000 of your customers for anyone to download? Your idea of ‘de-identifying’ the data is to replace the screen name with an arbitrary number?…

  3. fredshouse said:

    AOL discloses 650,000 AOL users’ search data

    Well this isn’t going to help AOL’s image. Over the weekend, AOL researchers posted a 400MB+ tarball of the raw search query data of some 650K AOL users over the period from March 1, 2006 to May 30, 2006. While…

  4. SiliconBeat said:

    AOL responds to data leak. They screwed up.

    John Battelle has gotten an early response from AOL about the data leak that we posted about early yesterday. Here’s the summary: This was a screw up, and we’re angry and upset about it. It was an innocent enough attempt to reach out to the academic c…

  5. August 7th, 2006
    11:46 am

    Zoli's Blog said:

    AOL Just Did the Unthinkable - Boycott AOL?

    (Updated)Thank you, Google for resisting the DOJ’s effort to obtain user search data. You put up a good fight to protect our privacy, and

  6. Platinax News said:

    AOL’s huge data blunder

    SPECIAL REPORT
    AOL have released a big chunk of user data to the internet in a huge blunder.
    The data was a record of 20 million searches on the AOL search engine, carried out by 650,000 AOL users over March to May of this year.
    The data was pr…

  7. Research said:

    Research

    The network promotes synthesis and comparative Search through our extensive index.JupiterResearch provides unbiased rese…

4 Comments

  1. Search Engines WEB said:

    http://research.microsoft.com/ur/us/fundingopps/RFPs/Search_2006_RFP.aspx

    a few months ago - Microsoft lauched an analogous project

  2. August 6th, 2006
    10:59 pm

    breakingranks said:

    I dare you to compare this to the HIPAA standard of “de-identified” information that health organizations are now using as the standard to release data.

  3. August 8th, 2006
    12:29 am

    daniel said:

    if you don’t want to download 2 gigs and grep your way through, here’s a site that’ll let you search from a database: http://www.aolsearchdatabase.com .

  4. Bob said:

    search engine proxies have been around for a least a few years. Why dont people start using them?

    heres a free one. http://www.blackboxsearch.com

Add a Comment