Skip to main content
Unpublished Paper
Mining Relational Structure from Millions of Books
(2011)
  • David A. Smith
  • R. Manmatha, University of Massachusetts - Amherst
  • James Allan
Abstract

Existing large-scale scanned book collections have many short- comings for data-driven research, from OCR of variable quality to the lack of accurate descriptive and structural meta-data. We argue that complementary research in inferring relational metadata is important in its own right to support use of these collections and that it can help to mitigate other problems with scanned book collections.

Keywords
  • Digital libraries,
  • Design,
  • relational metadata,
  • partial duplicate detection
Disciplines
Publication Date
2011
Comments
This is the pre-published version harvested from CIIR.
Citation Information
David A. Smith, R. Manmatha and James Allan. "Mining Relational Structure from Millions of Books" (2011)
Available at: http://works.bepress.com/r_manmatha/8/