Improve Latent Semantic Analysis Based Language Model by Integrating Multiple Level KnowledgeComputer Science Department
Date of Original Version1-1-2002
Abstract or DescriptionWe describe an extension to the use of Latent Semantic Analysis (LSA) for language modeling. This technique makes it easier to exploit long distance relationships in natural language for which the traditional n-gram is unsuited. However, with the growth of length, the semantic representation of the history may be contaminated by irrelevant information, increasing the uncertainty in predicting the next word. To address this problem, we propose a multilevel framework dividing the history into three levels corresponding to document, paragraph and sentence. To combine the three levels of information with the n-gram, a Softmax network is used. We further present a statistical scheme that dynamically determines the unit scope in the generalization stage. The combination of all the techniques leads to a 14% perplexity reduction on a subset of Wall Street Journal, compared with the trigram model.
Citation InformationRong Zhang and Alexander I Rudnicky. "Improve Latent Semantic Analysis Based Language Model by Integrating Multiple Level Knowledge" (2002)
Available at: http://works.bepress.com/alexander_rudnicky/82/