Skip to main content
Unpublished Paper
Toward Interactive Training and Evaluation
(2011)
  • Gregory Druck
  • Andrew McCallum, University of Massachusetts - Amherst
Abstract
Machine learning often relies on costly labeled data, which impedes its application to new classification and information extraction problems. This motivates the development of methods that leverage our abundant prior knowledge about these problems in learning. Several recently proposed methods incorporate prior knowledge with constraints on the expectations of a probabilistic model. Building on this work, we envision an interactive training paradigm in which practitioners perform evaluation, analyze errors, and provide and refine expectation constraints in a closed loop. In this paper, we focus on several key subproblems in this paradigm that can be cast as selecting a representative sample of the unlabeled data for the practitioner to inspect. To address these problems, we propose stratified sampling methods that use model expectations as a proxy for latent output variables. In classification and sequence labeling experiments, these sampling strategies reduce accuracy evaluation effort by as much as 53%, provide more reliable estimates of F1 for rare labels, and aid in the specification and refinement of constraints.
Disciplines
Publication Date
2011
Comments
This is the pre-published version harvested from CIIR.
Citation Information
Gregory Druck and Andrew McCallum. "Toward Interactive Training and Evaluation" (2011)
Available at: http://works.bepress.com/andrew_mccallum/67/