Skip to main content
Contribution to Book
Sign Constraints on Feature Weights Improve a Joint Model of Word Segmentation and Phnology
Human Language Technologies: The 2015 Annual Conference of the North American Chapter of the ACL (2015)
  • Mark Johnson, Macquarie University
  • Emmanuel Dupoux, Ecole des Hautes Etudes en Sciences Sociales
  • Joe Pater
  • Robert Staubs
Abstract
This paper describes a joint model of word segmentation and phonological alternations, which takes unsegmented utterances as input and infers word segmentations and underlying phonological representations. The model is a Maximum Entropy or log-linear model, which can express a probabilistic version of Optimality Theory (OT; Prince and Smolensky (2004)), a standard phonological framework. The features in our model are inspired by OT’s Markedness and Faithfulness constraints. Following the OT principle that such features indicate “violations”, we require their weights to be non-positive. We apply our model to a modified version of the Buckeye corpus (Pitt et al., 2007) in which the only phonological alternations are deletions of word-final /d/ and /t/ segments. The model sets a new state-ofthe-art for this corpus for word segmentation, identification of underlying forms, and identi- fication of /d/ and /t/ deletions. We also show that the OT-inspired sign constraints on feature weights are crucial for accurate identifi- cation of deleted /d/s; without them our model posits approximately 10 times more deleted underlying /d/s than appear in the manually annotated data.
Disciplines
Publication Date
2015
Publisher
Association for Computational Linguistics
Citation Information
Mark Johnson, Emmanuel Dupoux, Joe Pater and Robert Staubs. "Sign Constraints on Feature Weights Improve a Joint Model of Word Segmentation and Phnology" Human Language Technologies: The 2015 Annual Conference of the North American Chapter of the ACL (2015) p. 303 - 313
Available at: http://works.bepress.com/joe_pater/13/