"Combining Text and Audio-Visual Features in Video Indexing" by Shih-Fu Chang

Selected Works of R. Manmatha

Follow Contact

Unpublished Paper

Combining Text and Audio-Visual Features in Video Indexing

(2005)

Shih-Fu Chang
R. Manmatha, University of Massachusetts - Amherst
Tat-Seng Chua

Download

Abstract

We discuss the opportunities, state of the art, and open research issues in using multi-modal features in video indexing. Specifically, we focus on how imperfect text data obtained by automatic speech recognition (ASR) may be used to help solve challenging problems, such as story segmentation, concept detection, retrieval, and topic clustering. We review the frameworks and machine learning techniques that are used to fuse the text features with audio-visual features. Case studies showing promising performance will be described, primarily in the broadcast news video domain.

Disciplines

Computer Sciences

Publication Date

2005

Comments

This is the pre-published version harvested from CIIR.

Citation Information

Shih-Fu Chang, R. Manmatha and Tat-Seng Chua. "Combining Text and Audio-Visual Features in Video Indexing" (2005)
Available at: http://works.bepress.com/r_manmatha/34/