"Undirected and Interpretable Continuous Topic Models of Documents" by X. Wang

Selected Works of Andrew McCallum

Follow Contact

Unpublished Paper

Undirected and Interpretable Continuous Topic Models of Documents

(2007)

X. Wang
K. Crammer
Andrew McCallum, University of Massachusetts - Amherst

Download

Abstract

We propose a new type of undirected graphical model suitable for topic modeling and dimensionality reduction for large text collections. Unlike previous Boltzmann machine and harmonium based methods, this new model represents words using Discrete distributions akin to traditional `bag-of-words' methods. However, in contrast to directed topic models such as latent Dirichlet allocation, each word is drawn from a distribution that takes into account all possible topics, as opposed to a topic-specific distribution. Furthermore, our models use positive continuous valued latent variables and learn more interpretable latent topic spaces than previous undirected techniques. As other undirected models, once such models have been learned, inference required for representing a document in the latent space is fast. We present document retrieval experiments showing the benefits of our new approach.

Disciplines

Computer Sciences

Publication Date

2007

Comments

This is the pre-published version harvested from CIIR.

Citation Information

X. Wang, K. Crammer and Andrew McCallum. "Undirected and Interpretable Continuous Topic Models of Documents" (2007)
Available at: http://works.bepress.com/andrew_mccallum/108/