Data Selection for Speech RecognitionComputer Science Department
Date of Original Version1-1-2007
Abstract or DescriptionThis paper presents a strategy for efﬁciently selecting informative data from large corpora of transcribed speech. We propose to choose data uniformly according to the distribution of some target speech unit (phoneme, word, character, etc). In our experiment, in contrast to the common belief that “there is no data like more data”, we found it possible to select a highly informative subset of data that produces recognition performance comparable to a system that makes use of a much larger amount of data. At the same time, our selection process is efﬁcient and fast.
Citation InformationYi Wu, Alexander I Rudnicky and Rong Zhang. "Data Selection for Speech Recognition" (2007)
Available at: http://works.bepress.com/alexander_rudnicky/18/