The object function for Boosting training method in acoustic modeling aims to reduce utterance level error rate. This is different from the most commonly used performance metric in speech recognition, word error rate. This paper proposes that the combination of N-best list re-ranking and ROVER can partly address this problem. In particular, model combination is applied to re-ranked hypotheses rather than to the original top-1 hypotheses and carried on word level. Improvement of system performance is observed in our experiments. In addition, we describe and evaluate a new confidence feature that measures the correctness of frame level decoding result.
Available at: http://works.bepress.com/alexander_rudnicky/1/