Skip to main content
Unpublished Paper
Word Image Matching Using Dynamic Time Warping
(2002)
  • Toni M. Rath
  • R. Manmatha, University of Massachusetts - Amherst
Abstract

Libraries and other institutions are interested in providing access to scanned versions of their large collections of handwritten historical manuscripts on the web or on CDROMs. Providing convenient access to a collection requires an index which is manually created at great labour and expense. Since current handwriting recognizers do not perform well on historical documents, a technique called word spotting has been developed. It addresses the need for indexing single-author handwritten historical manuscripts in a new way: word images are matched to form clusters which contain occurrences of the same word throughout a collection. By annotating ``interesting" clusters, an index can be built automatically.

Given a segmented page, matching handwritten word images in historical documents is a great challenge due to the variations in handwriting and the noise in the image. We present an algorithm for matching handwritten words in historical documents using dynamic time warping. The images are preprocessed to create a set of 1-dimensional features. The features extracted from words are compared using dynamic time warping. We present experimental results on two different data sets from the George Washington collection. Our experiments show that this algorithm performs better and is faster than competing matching techniques.

Disciplines
Publication Date
2002
Comments
This is the pre-published version harvested from CIIR.
Citation Information
Toni M. Rath and R. Manmatha. "Word Image Matching Using Dynamic Time Warping" (2002)
Available at: http://works.bepress.com/r_manmatha/17/