"Learning on the fly: Font free approaches to difficult OCR problems" by Andrew Kae

Selected Works of Erik G Learned-Miller

Follow Contact

Article

Learning on the fly: Font free approaches to difficult OCR problems

Proceedings of the International Conference on Document Analysis and Recognition (2009)

Andrew Kae
Erik G Learned-Miller, University of Massachusetts - Amherst

Download

Abstract

Despite ubiquitous claims that optical character recog- nition (OCR) is a “solved problem,” many categories of documents continue to break modern OCR software such as documents with moderate degradation or unusual fonts. Many approaches rely on pre-computed or stored charac- ter models, but these are vulnerable to cases when the font of a particular document was not part of the training set, or when there is so much noise in a document that the font model becomes weak. To address these difficult cases, we present a form of iterative contextual modeling that learns character models directly from the document it is trying to recognize. We use these learned models both to segment the characters and to recognize them in an incremental, itera- tive process. We present results comparable to those of a commercial OCR system on a subset of characters from a difficult test document.

Disciplines

Computer Sciences

Publication Date

2009

Citation Information

Andrew Kae and Erik G Learned-Miller. "Learning on the fly: Font free approaches to difficult OCR problems" Proceedings of the International Conference on Document Analysis and Recognition (2009)
Available at: http://works.bepress.com/erik_learned_miller/59/