Usable speech detection using a context dependent Gaussian mixture model classifierInternational Symposium on Circuits and Systems (2004)
AbstractSpeech that is corrupted by nonstationary interference, but contains segments that are still usable for applications such as speaker identification or speech recognition, is referred to as "usable" speech. A common example of nonstationary interference occurs when there is more than one person talking at the same time, which is known as co-channel speech. In general the above speech processing applications do not work in co-channel environments; however, they can work on the extracted usable segments. Unfortunately, currently available usable speech measures only detect about 75% of the total available usable speech. The first reason for this high error stems from the fact that no single feature can accurately identify all the usable speech characteristics. This situation can be resolved by using a Gaussian mixture model (GMM) based classifier to combine several usable speech features. A second source of error stems from the fact that the current usable speech measures treat each frame of co-channel data independently of the decisions made on adjacent frames. The situation can be resolved when a hidden Markov model (HMM) is used to incorporate any context dependent information in adjacent frames. Using this approach we were able to obtain 84% reduction of usable speech with a 16% false alarm rate.
Publication DateMay, 2004
Citation InformationRobert E Yantorno, Brett Y Smolenski, Ananth N Iyer and Jashmin K Shah. "Usable speech detection using a context dependent Gaussian mixture model classifier" International Symposium on Circuits and Systems Vol. 5 (2004)
Available at: http://works.bepress.com/iyer/14/