Skip to main content
Article
Improve Latent Semantic Analysis Based Language Model by Integrating Multiple Level Knowledge
Computer Science Department
  • Rong Zhang, Carnegie Mellon University
  • Alexander I Rudnicky, Carnegie Mellon University
Date of Original Version
1-1-2002
Type
Conference Proceeding
Abstract or Description
We describe an extension to the use of Latent Semantic Analysis (LSA) for language modeling. This technique makes it easier to exploit long distance relationships in natural language for which the traditional n-gram is unsuited. However, with the growth of length, the semantic representation of the history may be contaminated by irrelevant information, increasing the uncertainty in predicting the next word. To address this problem, we propose a multilevel framework dividing the history into three levels corresponding to document, paragraph and sentence. To combine the three levels of information with the n-gram, a Softmax network is used. We further present a statistical scheme that dynamically determines the unit scope in the generalization stage. The combination of all the techniques leads to a 14% perplexity reduction on a subset of Wall Street Journal, compared with the trigram model.
Citation Information
Rong Zhang and Alexander I Rudnicky. "Improve Latent Semantic Analysis Based Language Model by Integrating Multiple Level Knowledge" (2002)
Available at: http://works.bepress.com/alexander_rudnicky/82/