"Leveraging Existing Resources using Generalized Expectation Criteria" by Gregory Druck

Selected Works of Andrew McCallum

Follow Contact

Unpublished Paper

Leveraging Existing Resources using Generalized Expectation Criteria

(2007)

Gregory Druck
Gideon Mann
Andrew McCallum, University of Massachusetts - Amherst

Download

Abstract

It is difficult to apply machine learning to many real-world tasks because there are no existing labeled instances. In one solution to this problem, a human expert provides instance labels that are used in traditional supervised or semi-supervised training. Instead, we want a solution that allows us to leverage existing resources other than complete labeled instances. We propose the use of generalized expectation (GE) criteria to achieve this goal. A GE criterion is a term in a training objective function that assigns a score to values of a model expectation. In this paper, the expectations are model predicted class distributions conditioned on the presence of selected features, and the score function is the Kullback-Leibler divergence from reference distributions that are estimated using existing resources. We apply this method to the problem of named-entity-recognition, leveraging available lexicons. Using no conventionally labeled instances, we learn a sliding-window multinomial logistic regression model that obtains an F1 score of 0.692 on the CoNLL 2003 data. To attain the same accuracy a supervised classifier requires 4,000 labeled instances.

Disciplines

Computer Sciences

Publication Date

2007

Comments

This is the pre-published version harvested from CIIR.

Citation Information

Gregory Druck, Gideon Mann and Andrew McCallum. "Leveraging Existing Resources using Generalized Expectation Criteria" (2007)
Available at: http://works.bepress.com/andrew_mccallum/98/