Skip to main content
Presentation
The Study of Effect of Length in Morphological Segmentation of Agglutinative Languages
Proceedings of the First Workshop on Multilingual Modeling (2012)
  • Loganathan Ramasamy, Charles University
  • Zdenek Zabokrtsy, Charles University
  • Sowmya Vajjala, Universitat Tubingen
Abstract
Morph length is one of the indicative feature that helps learning the morphology of languages, in particular agglutinative languages. In this paper, we introduce a simple unsupervised model for morphological segmentation and study how the knowledge of morph length affect the performance of the segmentation task under the Bayesian framework. The model is based on (Goldwater et al., 2006) unigram word segmentation model and assumes a simple prior distribution over morph length. We experiment this model on two highly related and agglutinative languages namely Tamil and Telugu, and compare our results with the state of the art Morfessor system. We show that, knowledge of morph length has a positive impact and provides competitive results in terms of overall performance. 
Publication Date
July, 2012
Location
Jeju, Republic of Korea
Comments
Copyright 2012 The Authors
Citation Information
Loganathan Ramasamy, Zdenek Zabokrtsy and Sowmya Vajjala. "The Study of Effect of Length in Morphological Segmentation of Agglutinative Languages" Proceedings of the First Workshop on Multilingual Modeling (2012)
Available at: http://works.bepress.com/sowmya-vajjala/14/