Skip to main content
Unpublished Paper
Rethinking LDA: Why Priors Matter
(2009)
  • Hanna M. Wallach, University of Massachusetts - Amherst
  • David Minmo
  • Andrew McCallum
Abstract
Implementations of topic models typically use symmetric Dirichlet priors with fixed concentration parameters, with the implicit assumption that such "smoothing parameters" have little practical effect. In this paper, we explore several classes of structured priors for topic models. We find that an asymmetric Dirichlet prior over the document-topic distributions has substantial advantages over a symmetric prior, while an asymmetric prior over the topic-word distributions provides no real benefit. Approximation of this prior structure through simple, efficient hyperparameter optimization steps is sufficient to achieve these performance gains. The prior structure we advocate substantially increases the robustness of topic models to variations in the number of topics and to the highly skewed word frequency distributions common in natural language. Since this prior structure can be implemented using efficient algorithms that add negligible cost beyond standard inference techniques, we recommend it as a new standard for topic modeling.
Disciplines
Publication Date
2009
Comments
This is the pre-published version harvested from CIIR.
Citation Information
Hanna M. Wallach, David Minmo and Andrew McCallum. "Rethinking LDA: Why Priors Matter" (2009)
Available at: http://works.bepress.com/andrew_mccallum/31/