"Generalized Expectation Criteria for Bootstrapping Extractors using Record-Text Alignment" by Kedar Bellare

Selected Works of Andrew McCallum

Follow Contact

Unpublished Paper

Generalized Expectation Criteria for Bootstrapping Extractors using Record-Text Alignment

(2009)

Kedar Bellare
Andrew McCallum, University of Massachusetts - Amherst

Download

Abstract

Traditionally, machine learning approaches for information extraction require human annotated data that can be costly and time-consuming to produce. However, in many cases, there already exists a database (DB) with schema related to the desired output, and records related to the expected input text. We present a conditional random field (CRF) that aligns tokens of a given DB record and its realization in text. The CRF model is trained using only the available DB and unlabeled text with generalized expectation criteria. An annotation of the text induced from inferred alignments is used to train an information extractor. We evaluate our method on a citation extraction task in which alignments between DBLP database records and citation texts are used to train an extractor. Experimental results demonstrate an error reduction of 35% over a previous state-of-the-art method that uses heuristic alignments.

Disciplines

Computer Sciences

Publication Date

2009

Comments

This is the pre-published version harvested from CIIR.

Citation Information

Kedar Bellare and Andrew McCallum. "Generalized Expectation Criteria for Bootstrapping Extractors using Record-Text Alignment" (2009)
Available at: http://works.bepress.com/andrew_mccallum/83/