Skip to main content
Unpublished Paper
Polylingual Topic Models
(2009)
  • David Mimno
  • Hanna M. Wallach, University of Massachusetts - Amherst
  • Jason Naradowsky
  • David A. Smith
  • Andrew McCallum
Abstract
Topic models are a useful tool for analyzing large text collections, but have previously been applied in only monolingual, or at most bilingual, contexts. Meanwhile, massive collections of interlinked documents in dozens of languages, such as Wikipedia, are now widely available, calling for tools that can characterize content in many languages. We introduce a polylingual topic model that discovers topics aligned across multiple languages. We explore the model's characteristics using two large corpora, each with over ten different languages, and demonstrate its usefulness in supporting machine translation and tracking topic trends across languages.
Disciplines
Publication Date
2009
Comments
This is the pre-published version harvested from CIIR.
Citation Information
David Mimno, Hanna M. Wallach, Jason Naradowsky, David A. Smith, et al.. "Polylingual Topic Models" (2009)
Available at: http://works.bepress.com/andrew_mccallum/33/