Skip to main content
Article
Paired-Sampling in Density-Sensitive Active Learning
Computer Science Department
  • Pinar Donmez, Carnegie Mellon University
  • Jaime G. Carbonell, Carnegie Mellon University
Date of Original Version
1-1-2008
Type
Conference Proceeding
Abstract or Description
Active learning consists of principled on-line sampling over unlabeled data to optimize supervised learning rates as a function of the number of labels requested from an external oracle. A new sampling technique for active learning is developed based on two key principles: 1) Balanced sampling on both sides of the decision boundary is more effective than sampling one side disproportionately, and 2) exploiting the natural grouping (clustering) of unlabeled data establishes a more meaningful non-Euclidean distance function with respect to estimated category membership. Our new paired-sampling density-sensitive method embodying these principles yields significantly superior performance in multiple active learning data sets over all other sampling methods in our comparative study: representative sampling, uncertainty sampling, density-based sampling, and random sampling.
Citation Information
Pinar Donmez and Jaime G. Carbonell. "Paired-Sampling in Density-Sensitive Active Learning" (2008)
Available at: http://works.bepress.com/jaime_carbonell/15/