We compare in this study two image restoration approaches for the pre-processing of printed documents: namely the Non-local Means filter and a total variation minimization approach. We apply these two ap- proaches to printed document sets from various periods, and we evaluate their effectiveness through character recognition performance using an open source OCR. Our results show that for each document set, one or both pre-processing methods improve character recog- nition accuracy over recognition without preprocessing. Higher accuracies are obtained with Non-local Means when characters have a low level of degradation since they can be restored by similar neighboring parts of non-degraded characters. The Total Variation approach is more effective when characters are highly degraded and can only be restored through modeling instead of using neighboring data.
©2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. DOI: 10.1109/ICDAR.2009.210
Available at: http://works.bepress.com/elisa_barney_smith/9/