Skip to main content
Unpublished Paper
Nonsensitive in Isolation, Sensitive in Aggregation: The Importance of Data Granularity for Privacy
(2013)
  • Clarissa D. Simon
  • Raizel Liebler, The Learned Fangirl
  • Keidra Chaney
Abstract
Increasing amounts of potentially personally identifiable data is produced online by users at an astounding rate. Some of this type of information includes online searching results, and online tracking information from cookies and other tools.
 
While there are standards for anonymized information within traditional research models that include human subjects in the United States, such as HIPAA (Health Insurance Portability and Accountability Act of 1996) and traditional statistical studies through IRBs, no current model properly protects all of the potentially personally identifiable information we are constantly producing online. A great deal of the concern over online information privacy comes from the assumption that the information shared is generally anonymous. However, several controversies have demonstrated that researchers can indeed determine who an individual is from a large data set that has been assumed to contain only anonymous data. In 2006, the first rumblings of deanonymizing online information was seen through the ability of a New York Times reporter to determine the identify of a specific AOL search user (“A Face Is Exposed for AOL Searcher No. 4417749,” N.Y. Times, August 6, 2006). But identifying seemingly anonymous individuals can also be accomplished on a larger scale.
 
The more recent controversy regarding Facebook friends shows that it is relatively easy for researchers – and the general public – to use information pulled together from different sources to either identify people for the first time or to deanonymize information. (Marc Parry, “Harvard Researchers Accused of Breaching Students’ Privacy,” Chronicle of Higher Education , July 10, 2011).
 
Recently it was reported that bloggers who use Google Analytics may be individually identified through their ID information, potentially risking the anonymity of bloggers. The concern about the ability to identify online users follows the increased concern about this issue in the medical and public health literature about identification in health databases.
 
Health information data is increasingly integrated from multiple labs into open-access translational research information systems (OTRIS) to allow for a large scope of datasets to reanalyze existing studies and to analyze data in new ways. The ways that medical information can remain anonymous can help to inform models for online data privacy. Our paper builds on the insight of Paul Ohn, “Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization,” 57 UCLA L. REV . 1708 (2010), where he discusses the failure of anonymization efforts to preclude reidentification through multiple varied techniques. He suggests that “Data can be either useful or perfectly anonymous but never both.” (1704).
 
Through the standards used in public health's balancing of use of data through standards set under HIPAA, we suggest a sliding scale model as a means to standardize the types information collected that can be put back together, based on the potential usefulness to both the individual providing the information and the public.
Publication Date
2013
Comments
Presented at Midwest Law & Society Colloquium
Citation Information
Clarissa D. Simon, Raizel Liebler and Keidra Chaney. "Nonsensitive in Isolation, Sensitive in Aggregation: The Importance of Data Granularity for Privacy" (2013)
Available at: http://works.bepress.com/raizelliebler/23/