Skip to main content
Unpublished Paper
An Alternative Prior Process for Nonparametric Bayesian Clustering
(2010)
  • Hanna M. Wallach, University of Massachusetts - Amherst
  • Shane T. Jensen
  • Lee Dicker
  • Katherine A. Heller
Abstract
Prior distributions play a crucial role in Bayesian approaches to clustering. Two commonly-used prior distributions are the Dirichlet and Pitman-Yor processes. In this paper, we investigate the predictive probabilities that underlie these processes, and the implicit "rich-get-richer" characteristic of the resulting partitions. We explore an alternative prior for nonparametric Bayesian clustering---the uniform process---for applications where the "rich-get-richer" property is undesirable. We also explore the cost of this process: partitions are no longer exchangeable with respect to the ordering of variables. We present new asymptotic and simulation-based results for the clustering characteristics of the uniform process and compare these with known results for the Dirichlet and Pitman-Yor processes. We compare performance on a real document clustering task, demonstrating the practical advantage of the uniform process despite its lack of exchangeability over orderings.
Disciplines
Publication Date
2010
Comments
This is the pre-published version harvested from CIIR.
Citation Information
Hanna M. Wallach, Shane T. Jensen, Lee Dicker and Katherine A. Heller. "An Alternative Prior Process for Nonparametric Bayesian Clustering" (2010)
Available at: http://works.bepress.com/hanna_wallach/11/