This paper describes the development of a word-level confidence metric suitable for use in a dialog system. Two aspects of the problems are investigated: the identification of useful features and the selection of an effective classifier. We find that two parse-level features, Parsing-Mode and SlotBackoff-Mode, provide annotation accuracy comparable to that observed for decoder-level features. However, both decoderlevel and parse-level features independently contribute to confidence annotation accuracy. In comparing different classification techniques, we found that Support Vector
Machines (SVMs) appear to provide the best accuracy. Overall we achieve 39.7% reduction in annotation uncertainty for a binary confidence decision in a travel-planning domain.
Available at: http://works.bepress.com/alexander_rudnicky/76/