Skip to main content
Unpublished Paper
Table Extraction for Answer Retrieval
(2004)
  • Xing Wei
  • Bruce Croft
  • Andrew McCallum, University of Massachusetts - Amherst
Abstract
The ability to find tables and extract information from them is a necessary component of question answering and other information retrieval tasks. Documents often contain tables in order to communicate densely packed, multidimensional information. Tables do this by employing layout patterns to efficiently indicate fields and records in two-dimensional form. Their rich combination of formatting and content present difficulties for traditional retrieval techniques. This paper describes techniques for extracting tables from text and retrieving answers from the extracted information. We compare machine learning (especially conditional random fields) and heuristic methods for table extraction. Our approach creates a cell document, which contains the cell and its metadata (headers, titles), for each table cell, and the retrieval model ranks the cells of the extracted tables using a language modeling approach. Performance is tested using government statistical Web sites and news articles, and errors are analyzed in order to improve the system.
Keywords
  • Table extraction,
  • conditional random fields,
  • question answering,
  • information extraction
Disciplines
Publication Date
2004
Comments
This is the pre-published version harvested from CIIR.
Citation Information
Xing Wei, Bruce Croft and Andrew McCallum. "Table Extraction for Answer Retrieval" (2004)
Available at: http://works.bepress.com/andrew_mccallum/44/