Correcting Broken Characters in the Recognition of Historical Printed Documents

Abstract
This paper presents a new technique for dealing with broken characters, one of the major challenges in the optical character recognition (OCR) of degraded historical printed documents. A technique based on graph combinatorics is used to rejoin the appropriate connected components. It has been applied to real data with successful results.
Description
Keywords
Citation
2003. Proceedings of the 3rd ACM/IEEE-CS Joint Conference on Digital Libraries. IEEE Computer Society, Washington, DC, USA.