Skip to main content
Article
Distributed Unsupervised Semantic Parsing of Large-Scale Document Corpora
BigLearn: Neural Information Processing Systems Workshop (2011)
  • Evgeny Sitnikov
  • Achim Rettinger
  • Ole J Mengshoel
Abstract
Large-scale text corpora constitute a great opportunity and challenge for natural language processing including machine learning. One state-of-the-art approach, Unsupervised Semantic Parsing technique (USP), clusters synonymic relations on a sentence level and has been shown to answer a broad range of questions by exploiting these semantic clusters. In this paper, we propose Distributed USP (DUSP), which improves USP’s ability to handle large text corpora by distributing several of USP’s key algorithmic steps over a cluster of commodity computers. In experiments with DUSP, we processed a corpus that was over 13 times larger than the largest corpus we were able to handle using USP. In addition, DUSP’s processing speed was 284 documents per minute, versus 69.2 for USP.
Keywords
  • Text Corpora,
  • Natural Language Processing,
  • Machine Learning,
  • Clustering,
  • Unsupervised Semantic Parsing,
  • Distribution,
  • Computer Cluster
Publication Date
December, 2011
Citation Information
Evgeny Sitnikov, Achim Rettinger and Ole J Mengshoel. "Distributed Unsupervised Semantic Parsing of Large-Scale Document Corpora" BigLearn: Neural Information Processing Systems Workshop (2011)
Available at: http://works.bepress.com/ole_mengshoel/86/