Skip to main content
Unpublished Paper
Undirected and Interpretable Continuous Topic Models of Documents
(2007)
  • X. Wang
  • K. Crammer
  • Andrew McCallum, University of Massachusetts - Amherst
Abstract
We propose a new type of undirected graphical model suitable for topic modeling and dimensionality reduction for large text collections. Unlike previous Boltzmann machine and harmonium based methods, this new model represents words using Discrete distributions akin to traditional `bag-of-words' methods. However, in contrast to directed topic models such as latent Dirichlet allocation, each word is drawn from a distribution that takes into account all possible topics, as opposed to a topic-specific distribution. Furthermore, our models use positive continuous valued latent variables and learn more interpretable latent topic spaces than previous undirected techniques. As other undirected models, once such models have been learned, inference required for representing a document in the latent space is fast. We present document retrieval experiments showing the benefits of our new approach.
Disciplines
Publication Date
2007
Comments
This is the pre-published version harvested from CIIR.
Citation Information
X. Wang, K. Crammer and Andrew McCallum. "Undirected and Interpretable Continuous Topic Models of Documents" (2007)
Available at: http://works.bepress.com/andrew_mccallum/108/