Post-OCR Error Detection by Generating Plausible Candidates
- 1. L3i Laboratory, University of La Rochelle
Description
The accuracy of Optical Character Recognition (OCR) technologies considerably impacts the way digital documents
are indexed, accessed and exploited. Post-processing approaches detect and correct remaining errors to improve the
quality of OCR texts. However, state-of-the-art approaches still need to be improved. Most of the existing post-OCR techniques
use predefined error position lists or apply simple techniques to detect errors. In this paper, we describe a novel error
detector using different features from character-level (including character noisy channel, index of peculiarity) to word-level
(such as frequencies of n-grams, skip-grams, part-of-speech) Experimental results show that our approach outperforms the
best performing techniques in the ICDAR 2017 Competition on Post-OCR text correction.
Files
Post-OCR Error Detection by Generating Plausible Candidates.pdf
Files
(249.3 kB)
Name | Size | Download all |
---|---|---|
md5:8292242f9bf97e2416ffc480ae020bdb
|
249.3 kB | Preview Download |