"Joint VisualText Modeling for Automatic Retrieval of Multimedia Documents" by G. Iyengar

Selected Works of R. Manmatha

Follow Contact

Unpublished Paper

Joint VisualText Modeling for Automatic Retrieval of Multimedia Documents

(2005)

G. Iyengar
P. Duygulu
S. Feng
P. Ircing
S. P. Khudanpur
D. Klakow
M. R. Krause
R. Manmatha, University of Massachusetts - Amherst
H. J. Nock
D. Petkova
B. Pytlik
P. Virga

Download

Abstract

In this paper we describe our approach for jointly modeling the text part and the visual part of multimedia documents for the purpose of information retrieval(IR). In the prevalent state-of-the-art systems, a late combination between two independent systems, one analyzing just the text part of such documents, and the other analyzing the visual part without leveraging any knowledge acquired in the text processing, is the norm. Such systems rarely exceed the performance of any single modality (i.e. text or video) in information retrieval tasks. Our experiments indicate that allowing a rich interaction between the modalities results in signi.- cant improvement in performance over any single modality. Speci.cally, we extend the language model based text-IR approaches to multimedia retrieval. We demonstrate these results using the TRECVID03 corpus, which comprises 120 hours of broadcast news videos. Our results demonstrate over 14% improvement in IR performance over the best reported text-only baseline and ranks amongst the best results reported on this corpus.

Keywords

Information Search and Retrieval,
Image Processing and Computer Vision,
Joint Visual-Text Models,
TRECVID,
Multimedia Retrieval Models

Disciplines

Computer Sciences

Publication Date

2005

Comments

This is the pre-published version harvested from CIIR.

Citation Information

G. Iyengar, P. Duygulu, S. Feng, P. Ircing, et al.. "Joint VisualText Modeling for Automatic Retrieval of Multimedia Documents" (2005)
Available at: http://works.bepress.com/r_manmatha/35/