"Author Disambiguation using Error-driven Machine Learning with a Ranking Loss Function" by Aron Culotta

Selected Works of Andrew McCallum

Follow Contact

Unpublished Paper

Author Disambiguation using Error-driven Machine Learning with a Ranking Loss Function

(2007)

Aron Culotta
Pallika Kanani
Robert Hall
Michael Wick
Andrew McCallum, University of Massachusetts - Amherst

Download

Abstract

Author disambiguation is the problem of determining whether records in a publications database that contain similar author names refer to the same person. This task can be especially difficult when the database is constructed from automatically extracted data, which can contain noisy and incomplete records. A common supervised machine learning approach to author disambiguation is to build a classifier that predicts whether a pair of records is coreferent, often followed by a collective inference step to enforce transitivity of the predictions. By restricting the classifier to pairwise predictions, standard training algorithms for binary classification can be used. However, this approach ignores powerful evidence that can be obtained by examining sets (rather than pairs) of records, such as the number of publications or co-authors an author has. In this paper we propose a representation that enables these first-order features over sets of records. We also propose a training algorithm well-suited to this representation that is (1) error-driven in that training examples are generated from incorrect predictions on the training data, and (2) rank-based in that the classifier induces a ranking over candidate predictions. We evaluate our algorithms on three author disambiguation datasets and demonstrate error reductions of up to 60% over the standard binary classification approach.

Disciplines

Computer Sciences

Publication Date

2007

Comments

This is the pre-published version harvested from CIIR.

Citation Information

Aron Culotta, Pallika Kanani, Robert Hall, Michael Wick, et al.. "Author Disambiguation using Error-driven Machine Learning with a Ranking Loss Function" (2007)
Available at: http://works.bepress.com/andrew_mccallum/105/