Skip to main content
Unpublished Paper
Collective Segmentation and Labeling of Distant Entities in Information Extraction
(2004)
  • Charles Sutton
  • Andrew McCallum, University of Massachusetts - Amherst
Abstract
In information extraction, we often wish to identify all mentions of an entity, such as a person or organization. Traditionally, a group of words is labeled as an entity based only on local information. But information from throughout a document can be useful; for example, if the same word is used multiple times, it is likely to have the same label each time. We present a CRF that explicitly represents dependencies between the labels of pairs of similar words in a document. On a standard information extraction data set, we show that learning these dependencies leads to a 13.7% reduction in error on the field that had caused the most repetition errors.
Disciplines
Publication Date
2004
Comments
This is the pre-published version harvested from CIIR.
Citation Information
Charles Sutton and Andrew McCallum. "Collective Segmentation and Labeling of Distant Entities in Information Extraction" (2004)
Available at: http://works.bepress.com/andrew_mccallum/46/