
This paper focuses on improving the recognition of text in images of natural scenes, such as storefront signs or street signs. This is a difficult problem due to lighting conditions, variation in font shape and color, and complex backgrounds. We present a word recognition system that addresses these difficulties using an innovative technique to extract and recognize foreground text in an image. First, we develop a new method, called bilateral regression, for extracting and modeling one coherent (although not necessarily contiguous) region from an image. The method models smooth color changes across an image region without being corrupted by neighboring image regions. Second, rather than making a hard decision early in the pipeline about which region is foreground, we generate a set of possible foreground hypotheses, and choose among these using feedback from a recognition system. We show increased recognition performance using our segmentation method compared to the current state of the art. Overall, using our system we also show a substantial increase in word accuracy on the word spotting task over the current state of the art on the ICDAR 2003 word recognition data set.
Available at: http://works.bepress.com/erik_learned_miller/41/