Unsupervised indexing of noisy conversations with short speaker utterancesIEEE Aerospace Conference (2007)
AbstractTwo speaker indexing system for conversations are presented in this paper. The first method involves indexing two-speaker conversations. In this method, two reference models are judiciously chosen from the conversation such that they represent the two different speakers. Models are then matched to the reference speakers using distance-based comparisons. The second technique is based on first determining the number of participants in the conversation using a speaker count method termed the “Residual Ratio Algorithm” (RRA), and then indexing based on this count. The RRA involves an elimination process in which speech segments matching a chosen set of reference models are sequentially removed from the conversation and the relative amount of residual speech is observed to determine the count. The distance measures used in comparing models include the Bhattacharya distance, the T-Square statistics and the Mahalanobis distance. Speaker comparison decisions of all three distances are combined to improve the accuracy of the system. Linear Predictive Cepstral Coefficients of voiced phonemes are used in forming speaker models. The two-speaker indexing technique was able to yield an indexing accuracy of up to 95% when evaluated using SWITCHBOARD data. The counting-indexing technique resulted in a maximum indexing accuracy of about 91% when tested on artificial conversations generated from HTIMIT data.
Publication DateMarch, 2007
Citation InformationUchechukwu O Ofoegbu, Ananth N Iyer, Robert E Yantorno and Stanley J Wenndt. "Unsupervised indexing of noisy conversations with short speaker utterances" IEEE Aerospace Conference (2007)
Available at: http://works.bepress.com/iyer/17/