"Collective Segmentation and Labeling of Distant Entities in Information Extraction" by Charles Sutton

Selected Works of Andrew McCallum

Follow Contact

Unpublished Paper

Collective Segmentation and Labeling of Distant Entities in Information Extraction

(2004)

Charles Sutton
Andrew McCallum, University of Massachusetts - Amherst

Download

Abstract

In information extraction, we often wish to identify all mentions of an entity, such as a person or organization. Traditionally, a group of words is labeled as an entity based only on local information. But information from throughout a document can be useful; for example, if the same word is used multiple times, it is likely to have the same label each time. We present a CRF that explicitly represents dependencies between the labels of pairs of similar words in a document. On a standard information extraction data set, we show that learning these dependencies leads to a 13.7% reduction in error on the field that had caused the most repetition errors.

Disciplines

Computer Sciences

Publication Date

2004

Comments

This is the pre-published version harvested from CIIR.

Citation Information

Charles Sutton and Andrew McCallum. "Collective Segmentation and Labeling of Distant Entities in Information Extraction" (2004)
Available at: http://works.bepress.com/andrew_mccallum/46/