Skip to main content
Other
Toward Conditional Models of Identity Uncertainty with Application to Proper Noun Coreference
Computer Science Department Faculty Publication Series
  • Andrew McCallum, University of Massachusetts - Amherst
  • Ben Wellner
Publication Date
2003
Abstract

Coreference analysis, also known as record linkage or identity uncertainty, is a difficult and important problem in natural language processing, databases, citation matching and many other tasks. This paper introduces several discriminative, conditionalprobability models for coreference analysis, all examples of undirected graphical models. Unlike many historical approaches to coreference, the models presented here are relational—they do not assume that pairwise coreference decisions should be made independently from each other. Unlike other relational models of coreference that are generative, the conditional model here can incorporate a great variety of features of the input without having to be concerned about their dependencies— paralleling the advantages of conditional random fields over hidden Markov models. We present experiments on proper noun coreference in two text data sets, showing results in which we reduce error by nearly 28% or more over traditional thresholded record-linkage, and by up to 33% over an alternative coreference technique previously used in natural language processing.

Disciplines
Comments
This paper was harvested from CiteSeer
Citation Information
Andrew McCallum and Ben Wellner. "Toward Conditional Models of Identity Uncertainty with Application to Proper Noun Coreference" (2003)
Available at: http://works.bepress.com/andrew_mccallum/12/