"Rethinking LDA: Why Priors Matter" by Hanna M. Wallach

Selected Works of Andrew McCallum

Follow Contact

Unpublished Paper

Rethinking LDA: Why Priors Matter

(2009)

Hanna M. Wallach, University of Massachusetts - Amherst
David Minmo
Andrew McCallum

Download

Abstract

Implementations of topic models typically use symmetric Dirichlet priors with fixed concentration parameters, with the implicit assumption that such "smoothing parameters" have little practical effect. In this paper, we explore several classes of structured priors for topic models. We find that an asymmetric Dirichlet prior over the document-topic distributions has substantial advantages over a symmetric prior, while an asymmetric prior over the topic-word distributions provides no real benefit. Approximation of this prior structure through simple, efficient hyperparameter optimization steps is sufficient to achieve these performance gains. The prior structure we advocate substantially increases the robustness of topic models to variations in the number of topics and to the highly skewed word frequency distributions common in natural language. Since this prior structure can be implemented using efficient algorithms that add negligible cost beyond standard inference techniques, we recommend it as a new standard for topic modeling.

Disciplines

Computer Sciences

Publication Date

2009

Comments

This is the pre-published version harvested from CIIR.

Citation Information

Hanna M. Wallach, David Minmo and Andrew McCallum. "Rethinking LDA: Why Priors Matter" (2009)
Available at: http://works.bepress.com/andrew_mccallum/31/