Skip to main content
Unpublished Paper
Generalized Expectation Criteria for Bootstrapping Extractors using Record-Text Alignment
(2009)
  • Kedar Bellare
  • Andrew McCallum, University of Massachusetts - Amherst
Abstract
Traditionally, machine learning approaches for information extraction require human annotated data that can be costly and time-consuming to produce. However, in many cases, there already exists a database (DB) with schema related to the desired output, and records related to the expected input text. We present a conditional random field (CRF) that aligns tokens of a given DB record and its realization in text. The CRF model is trained using only the available DB and unlabeled text with generalized expectation criteria. An annotation of the text induced from inferred alignments is used to train an information extractor. We evaluate our method on a citation extraction task in which alignments between DBLP database records and citation texts are used to train an extractor. Experimental results demonstrate an error reduction of 35% over a previous state-of-the-art method that uses heuristic alignments.
Disciplines
Publication Date
2009
Comments
This is the pre-published version harvested from CIIR.
Citation Information
Kedar Bellare and Andrew McCallum. "Generalized Expectation Criteria for Bootstrapping Extractors using Record-Text Alignment" (2009)
Available at: http://works.bepress.com/andrew_mccallum/83/