Skip to main content
Unpublished Paper
Confidence Estimation for Information Extraction
(2004)
  • Aron Culotta
  • Andrew McCallum, University of Massachusetts - Amherst
Abstract
Information extraction techniques automatically create structured databases from unstructured data sources, such as the Web or newswire documents. Despite the successes of these systems, accuracy will always be imperfect. For many reasons, it is highly desirable to accurately estimate the confidence the system has in the correctness of each extracted field. The information extraction system we evaluate is based on a linear-chain conditional random field (CRF), a probabilistic model which has performed well on information extraction tasks because of its ability to capture arbitrary, overlapping features of the input in a Markov model. We implement several techniques to estimate the confidence of both extracted fields and entire multi-field records, obtaining an average precision of 98% for retrieving correct fields and 87% for multi-field records.
Disciplines
Publication Date
2004
Comments
This is the pre-published version harvested from CIIR.
Citation Information
Aron Culotta and Andrew McCallum. "Confidence Estimation for Information Extraction" (2004)
Available at: http://works.bepress.com/andrew_mccallum/41/