Skip to main content
Unpublished Paper
Leveraging Existing Resources using Generalized Expectation Criteria
(2007)
  • Gregory Druck
  • Gideon Mann
  • Andrew McCallum, University of Massachusetts - Amherst
Abstract
It is difficult to apply machine learning to many real-world tasks because there are no existing labeled instances. In one solution to this problem, a human expert provides instance labels that are used in traditional supervised or semi-supervised training. Instead, we want a solution that allows us to leverage existing resources other than complete labeled instances. We propose the use of generalized expectation (GE) criteria to achieve this goal. A GE criterion is a term in a training objective function that assigns a score to values of a model expectation. In this paper, the expectations are model predicted class distributions conditioned on the presence of selected features, and the score function is the Kullback-Leibler divergence from reference distributions that are estimated using existing resources. We apply this method to the problem of named-entity-recognition, leveraging available lexicons. Using no conventionally labeled instances, we learn a sliding-window multinomial logistic regression model that obtains an F1 score of 0.692 on the CoNLL 2003 data. To attain the same accuracy a supervised classifier requires 4,000 labeled instances.
Disciplines
Publication Date
2007
Comments
This is the pre-published version harvested from CIIR.
Citation Information
Gregory Druck, Gideon Mann and Andrew McCallum. "Leveraging Existing Resources using Generalized Expectation Criteria" (2007)
Available at: http://works.bepress.com/andrew_mccallum/98/