Lexicon construction is at the core of internationalizing speech systems, as it is the locus at which the correspondence between the written and spoken forms of a language is specified. For the most part, speech systems for a given language benefit from the attention of native speakers and the opportunity to tune performance over time, allowing the cost of lexicon development to be amortized over time. On the other hand rapid deployment of recognition capability for new languages stresses the need for rapid availability of a usable lexicon. We propose a decomposition of the lexicon building process, into four discrete and sequential steps that simplify and speed up the creation of language knowledge bases for recognition and synthesis. Results from four languages are discussed.
Available at: http://works.bepress.com/alexander_rudnicky/9/