By Reece Sellin
It’s certainly plausible that all would have seemed entirely innocuous – after all, at an initial glance, search data divorced from information that would connect it to specific users would appear to be very informative, but also very much anonymous. So seemed to be the line of thinking at America Online (AOL’s) research division around early August. That date marked the public release of a previously private set of search data, produced by some 658,000 users from March to May of this year.
It’s hard to know what AOL researchers thought when it was revealed – by the New York Times – that the search data wasn’t as anonymous as most may have suspected. It only took the New York Times a day or two to discover the identity of at least one formerly anonymous user: Thelma Arnold, a 62-year-old widow living in Lilburn, Georgia. Apparently she “…had no idea somebody was looking over [her] shoulder” while searching the web.
Adding a further dimension of interest to what many have since branded a “fiasco” is AOL’s subsequent characterization of the event as “a screw-up.” Although AOL did remove the data, while also apologizing for the data release, a pair of students set up a database-driven website, storing and indexing the searches while they were still available. That site? It remains on-line for all to access at will.
Although Ms. Arnold’s searches were relatively mundane, a number of other AOL users certainly weren’t so innocent. At least two unidentified users reportedly used the search facility to seek out child pornography, while other individual users researched things such as “how to make a meth lab,” “torture methods,” and “how to grow pot.”
Predictably, both the philosophical and legal implications of the situation have already prompted fairly extensive public debate. On the one hand are privacy advocates, armed with proven search-to-real-identity links based on the data, challenging those who, on the other hand (initially at least) felt the data to be both safe and an excellent tool for the “research community.”
Of course, as often seems to be the case with such issues, there is a “public safety” element, too – although by no means guaranteed, it is entirely possible that some of those with the most “concerning” search patterns may be investigated by law enforcement officials. It also seems plausible that the public release of the search engine data creates a civil rights “vacuum” for those individuals – after all, one need not obtain a warrant to access information that is already in the public domain.
To present Ping! readers with a bit more context on some of the larger issues, particularly in terms of how the “fiasco” affects those in the web hosting industry, I conducted a brief interview with Dr. Michael Geist, a leading Internet Law expert, who is also the Canada Research Chair of Internet and E-commerce Law at the University of Ottawa (http://www.michaelgeist.ca). That interview follows:
Reece Sellin: “Shortly after the data was released, there seemed to be quite a few ‘sceptics,’ as I believe you called them, some of whom quite strongly expressed their view that the information couldn’t be linked to individual users. Do you think the fact that they were very clearly proven wrong by the New York Times is somehow indicative of a larger issue — namely that individuals — even those intimately involved in researching search data — don’t take on-line privacy issues seriously enough?”
Dr. Michael Geist: “I think that’s a fair statement. It is certainly the case that the frailty of personal information — whether exposed by hackers, security breaches, lax privacy rules, or simply large scale data collection — is under-appreciated by users and many tracking the issue.”
RS: “Looking at the larger issues, do you think this ‘AOL Fiasco’ will be an event that serves to make Internet users in general more aware about their on-line privacy? You are probably aware of John Battelle’s comment that ‘the silver lining of a data leak like this is that it allows the culture to have a conversation about what we’re getting into here by tracking all this data.’ Do you see the AOL Fiasco as contributing to that ‘conversation,’ or is it still too minimal an issue to capture attention beyond those already interested in such privacy issues?”
MG: “I think it will attract attention in the same way that the Sony rootkit fiasco of last fall crystallized concerns around DRM [Digital Rights Management]. I don’t think that one incident will necessary galvanize the entire public, but it does serve to get more people engaged on these issues.”
RS: “There seems to be a rather interesting contrast regarding the privacy of user searches coming to light. On the one hand, there is Google, who seemed to stand quite alone in refusing to reveal search data after receiving a US Justice Department subpoena for such — while on the other, we have companies like AOL who seem (or at least seemed) to be quite willing to take things even one step further, and give the data to the public. Do you have any thoughts on where that may lead, particularly in terms of public trust of various search engine firms?”
MG: “I think that the public will increasingly expect intermediaries such as search engines and ISPs to represent their privacy interests. Those that do — Google in your example – will reap the benefits, while those that don’t – even if they follow the letter of the law – will face a consumer backlash.”
RS: “Finally, do you think there are any lessons in this “fiasco” for web hosting companies — who, for the most part at least, tend to be relatively small operations, at least compared to the scope of a company like AOL? Moreover, now that the data is offline on the AOL side, how do they go about having others take it down too, particularly considering that the data seems to have been released with quite minimal terms regarding its use?”
MG: “I don’t think you can take it down. This speaks to the frailty of online privacy — once the data is out on the network, it is invariably copied or cached and is next to impossible to retreive. The privacy genie can seemingly never be stuffed back into the bottle.”
Writer’s Bio: Reece Sellin is the Senior Editor of Ping! Zine, and the Chief Content Engineer at Net Logistics Pty. Ltd. (http://www.netlogistics.com.au), a leading Australian web hosting firm. He hails from the Great White North (also known as Canada).
Source: Ping! Zine Issue 18