Skip to main content
Unpublished Paper
Scale Space Technique for Word Segmentation in Handwritten Manuscripts
(1999)
  • R. Manmatha, University of Massachusetts - Amherst
  • Nitin Srimal
Abstract

Indexing large archives of historical manuscripts, like the papers of George Washington, is required to allow rapid perusal by scholars and researchers who wish to consult the original manuscripts. Presently, such large archives are indexed manually. Since optical character recognition (OCR) works poorly with handwriting, a scheme based on matching word images called word spotting has been suggested previously for indexing such documents. The important steps in this scheme are segmentation of a document page into words and creation of lists containing instances of the same word by word image matching.

We have developed a novel methodology for segmenting handwritten document images by analyzing the extent of ``blobs in a scale space representation of the image. The algorithm was been applied to around 30 grey level images randomly picked from different sections of the George Washington corpus of 6,400 handwritten document images. An accuracy of $77-96$ percent was observed with an average accuracy of around $87$ percent. The algorithm works well in the presence of noise, shine through and other artifacts which may arise due aging and degradation of the page over a couple of centuries or through the man made processes of photocopying and scanning. Most existing document analysis systems have been developed for machine printed text. There has been little work on word segmentation for handwritten documents. Most of this work has been applied to special kinds of pages - for example, addresses or ``clean pages which have been written specifically for testing the document analysis systems. Historical manuscripts suffer from many problems including noise, shine through and other artifacts due to aging and degradation. No good techniques exist to segment words from such handwritten manuscripts. Further, scale space techniques have not been applied to this problem before.

Disciplines
Publication Date
1999
Comments
This is the pre-published version harvested from CIIR.
Citation Information
R. Manmatha and Nitin Srimal. "Scale Space Technique for Word Segmentation in Handwritten Manuscripts" (1999)
Available at: http://works.bepress.com/r_manmatha/11/