Skip to main content
Unpublished Paper
Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations
(2006)
  • Wei Li
  • Andrew McCallum, University of Massachusetts - Amherst
Abstract
Latent Dirichlet allocation (LDA) and other related topic models are increasingly popular tools for summarization and manifold discovery in discrete data. However, LDA does not capture correlations between topics. In this paper, we introduce the pachinko allocation model (PAM), which captures arbitrary, nested, and possibly sparse correlations between topics using a directed acyclic graph (DAG). The leaves of the DAG represent individual words in the vocabulary, while each interior node represents a correlation among its children, which may be words or other interior nodes (topics). PAM provides a flexible alternative to recent work by Blei and Lafferty (2006), which captures correlations only between pairs of topics. Using text data from newsgroups, historic NIPS proceedings and other research paper corpora, we show improved performance of PAM in document classification, likelihood of held-out data, the ability to support finer-grained topics, and topical keyword coherence.
Disciplines
Publication Date
2006
Comments
This is the pre-published version harvested from CIIR.
Citation Information
Wei Li and Andrew McCallum. "Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations" (2006)
Available at: http://works.bepress.com/andrew_mccallum/130/