Coreference analysis, also known as record linkage or identity uncertainty, is a difficult and important problem in natural language processing, databases, citation matching and many other tasks. This paper introduces several discriminative, conditionalprobability models for coreference analysis, all examples of undirected graphical models. Unlike many historical approaches to coreference, the models presented here are relational—they do not assume that pairwise coreference decisions should be made independently from each other. Unlike other relational models of coreference that are generative, the conditional model here can incorporate a great variety of features of the input without having to be concerned about their dependencies— paralleling the advantages of conditional random fields over hidden Markov models. We present experiments on proper noun coreference in two text data sets, showing results in which we reduce error by nearly 28% or more over traditional thresholded record-linkage, and by up to 33% over an alternative coreference technique previously used in natural language processing.
Available at: http://works.bepress.com/andrew_mccallum/12/