10.5281/zenodo.3381148
https://zenodo.org/records/3381148
oai:zenodo.org:3381148
Thi Tuyet Hai Nguyen
Thi Tuyet Hai Nguyen
L3i Laboratory, University of La Rochelle
Adam Jatowt
Adam Jatowt
L3i Laboratory, University of La Rochelle
Mickaël Coustaty
Mickaël Coustaty
L3i Laboratory, University of La Rochelle
Nhu Van Nguyen
Nhu Van Nguyen
L3i Laboratory, University of La Rochelle
Antoine Doucet
Antoine Doucet
L3i Laboratory, University of La Rochelle
Post-OCR Error Detection by Generating Plausible Candidates
Zenodo
2019
OCR error detection
post-OCR
OCRed text
2019-08-29
eng
10.5281/zenodo.3381147
https://zenodo.org/communities/newseye
https://zenodo.org/communities/eu
Creative Commons Attribution 4.0 International
The accuracy of Optical Character Recognition (OCR) technologies considerably impacts the way digital documents
are indexed, accessed and exploited. Post-processing approaches detect and correct remaining errors to improve the
quality of OCR texts. However, state-of-the-art approaches still need to be improved. Most of the existing post-OCR techniques
use predefined error position lists or apply simple techniques to detect errors. In this paper, we describe a novel error
detector using different features from character-level (including character noisy channel, index of peculiarity) to word-level
(such as frequencies of n-grams, skip-grams, part-of-speech) Experimental results show that our approach outperforms the
best performing techniques in the ICDAR 2017 Competition on Post-OCR text correction.
European Commission
10.13039/501100000780
770299
NewsEye: A Digital Investigator for Historical Newspapers