Skip to main content
Metadata Cataloging, Storage, and Retrieval of Multilingual Motion Picture Subtitles: An XML Digital Library
  • Helena Marvin, University of Missouri-Saint Louis
  • Kimmy Szeto
The popularity of motion pictures in digital form has seen a dramatic increase in recent years, and the global entertainment market has driven demands for subtitles in multiple languages. This paper investigates the informational potential of aggregating a corpus of multilingual subtitles for a digital library. Subtitles are extracted from commercial DVD releases and downloaded from the internet. These subtitles and their bibliographic metadata are then incorporated in an XML-based database structure. A digital library prototype is developed to provide full-text search and browse of the subtitle text with single- or parallel-language displays. The resulting product includes a set of tools for subtitles acquisition and a web browser-based digital library prototype that is portable, extensible and interoperable across computing platforms. The functionalities of this prototype are discussed in comparison to another subtitles corpus created for computational linguistics studies. Several informational potentials of this digital library prototype are identified: as an educational tool for language learning, as a finding aid for citations, and as a gateway for additional temporal access points for video retrieval.
  • subtitles,
  • digital library,
  • cataloging,
  • XML,
  • SRT,
  • motion pictures,
  • metadata
Publication Date
Master of Library and Information Studies
Field of study
Library and Information Studies
Graduate School of Library and Information Studies
Colleen Cool
Citation Information
Helena Marvin and Kimmy Szeto. "Metadata Cataloging, Storage, and Retrieval of Multilingual Motion Picture Subtitles: An XML Digital Library" (2010)
Available at: