Skip to main content
Article
Extraction of Syntactic Translation Models from Parallel Data using Syntax from Source and Target Languages
Proceedings of MT Summit XII
  • Vamshi Ambati, Carnegie Mellon University
  • Alon Lavie, Carnegie Mellon University
  • Jaime G. Carbonell, Carnegie Mellon University
Date of Original Version
8-1-2009
Type
Conference Proceeding
Rights Management
Copyright 2009 AMTA
Abstract or Description

We propose a generic rule induction framework that is informed by syntax from both sides of a parsed parallel corpus, as sets of structural, boundary and labeling related constraints. Factoring syntax in this manner empowers our framework to work with independent annotations coming from multiple resources and not necessarily a single syntactic structure. We then explore the issue of lexical coverage of translation models learned in different scenarios using syntax from one side vs. both sides. We specifically look at how the non-isomorphic nature of parse trees for the two languages affects coverage. We propose a novel technique for restructuring targetside parse trees, that generates alternate isomorphic target trees that preserve the syntactic boundaries of constituents that were aligned in the original parse trees. We also show that combining rules extracted by restructuring syntactic trees on both sides produces significantly better translation models. The improved precision and coverage of our syntax tables particularly fill in for the lack of lexical coverage in Syntax based Machine Translation approaches.

Citation Information
Vamshi Ambati, Alon Lavie and Jaime G. Carbonell. "Extraction of Syntactic Translation Models from Parallel Data using Syntax from Source and Target Languages" Proceedings of MT Summit XII (2009)
Available at: http://works.bepress.com/jaime_carbonell/173/