Skip to main content
Unpublished Paper
Toward a Framework for the Large Scale Textual and Contextual Analysis of Government Information Declassification Patterns
(2011)
  • Rachel Shorey
  • Hanna M. Wallach, University of Massachusetts - Amherst
  • Bruce Demarais
Abstract
The U.S. government protects a massive number of documents as part of its Official Security Classification System. According to the NARA's Information Security Oversight Office, in 2008 over 2.3 million documents were classified, over 5.1 million pages were declassified, and the administration of the classification system cost approximately $8.6 billion. The scope of government secrecy is determined by the quantity of text that is classified and the duration for which that information remains secret. We focus on duration of classification by investigating the decision to declassify formerly secret documents. We design and implement a statistical approach for studying the declassification process. Our approach permits researchers to assess the effects of context and content of documents on the duration for which they are classified. A useful property of our approach is that analysts only need access to declassified documents to analyze the choices of government officials regarding still-classified information. We face an obvious sample-selection challenge in studying the decision to declassify a document -- we only have access to documents that have been declassified. However, given the classification/origination and declassification dates, we note that declassification can be studied as a truncated survival process. A document only makes it into our sample if it is declassified before the study date, which is a well-defined truncation condition that we model explicitly. This approach permits inference on the declassification process given access only to declassified documents. We examine the content-based determinants of declassification. In our approach, analysis of document content is performed automatically, using statistical topic modeling. Topic models automatically discover semantically coherent "topics" from the text of a document collection without specific human input. We take a novel approach and combine topic modeling with survival analysis to estimate topic and multi-topic specific distributions of the duration of classification. The resulting estimates allow direct comparison and hypothesis-testing regarding government secrecy related to specific topics. Our approach has important dual uses. First, it is helpful to researchers who want to systematically study declassification decisions, but only have access to already-declassified documents. Second, our approach can be helpful to government officials conducting declassification review by automatically proposing documents that are ripe for declassification based on their content and previous declassification practices. Since our approach distills reams of text into a manageable number of human-interpretable topics, we give declassification and transparency researchers a foothold into a formerly near-intractable problem. We study a corpus of over 85,000 documents from Gale's Declassified Documents Reference System. Placing these documents on common metrics, such as the duration, level and timing of classification and declassification, we are able to systematically extract and directly compare the dynamics of government secrecy related to such historically relevant subjects as the Vietnam War, Cold War and post-Soviet Europe, and the civil rights movement in the U.S.. We make two broad contributions in this work. First, we provide a method for scholars and/or government officials to quantitatively analyze the declassification process. Second, we provide a grounded comparison of the U.S. federal government's implementation of secrecy with regard to a diverse array of historical subjects.
Disciplines
Publication Date
2011
Comments
This is the pre-published version harvested from CIIR.
Citation Information
Rachel Shorey, Hanna M. Wallach and Bruce Demarais. "Toward a Framework for the Large Scale Textual and Contextual Analysis of Government Information Declassification Patterns" (2011)
Available at: http://works.bepress.com/hanna_wallach/15/