Skip to main content
Unpublished Paper
Joint VisualText Modeling for Automatic Retrieval of Multimedia Documents
(2005)
  • G. Iyengar
  • P. Duygulu
  • S. Feng
  • P. Ircing
  • S. P. Khudanpur
  • D. Klakow
  • M. R. Krause
  • R. Manmatha, University of Massachusetts - Amherst
  • H. J. Nock
  • D. Petkova
  • B. Pytlik
  • P. Virga
Abstract

In this paper we describe our approach for jointly modeling the text part and the visual part of multimedia documents for the purpose of information retrieval(IR). In the prevalent state-of-the-art systems, a late combination between two independent systems, one analyzing just the text part of such documents, and the other analyzing the visual part without leveraging any knowledge acquired in the text processing, is the norm. Such systems rarely exceed the performance of any single modality (i.e. text or video) in information retrieval tasks. Our experiments indicate that allowing a rich interaction between the modalities results in signi.- cant improvement in performance over any single modality. Speci.cally, we extend the language model based text-IR approaches to multimedia retrieval. We demonstrate these results using the TRECVID03 corpus, which comprises 120 hours of broadcast news videos. Our results demonstrate over 14% improvement in IR performance over the best reported text-only baseline and ranks amongst the best results reported on this corpus.

Keywords
  • Information Search and Retrieval,
  • Image Processing and Computer Vision,
  • Joint Visual-Text Models,
  • TRECVID,
  • Multimedia Retrieval Models
Disciplines
Publication Date
2005
Comments
This is the pre-published version harvested from CIIR.
Citation Information
G. Iyengar, P. Duygulu, S. Feng, P. Ircing, et al.. "Joint VisualText Modeling for Automatic Retrieval of Multimedia Documents" (2005)
Available at: http://works.bepress.com/r_manmatha/35/