Published April 19, 2020
| Version v1
Conference paper
Open
Neural Machine Translation with BERT for Post-OCR Error Detection and Correction
- 1. L3i, University of La Rochelle
- 2. Kyoto University
Description
The quality of OCR has a direct impact on information access, and an indirect impact on the performance of natural language processing applications, making fine-grained (e.g., semantic) information access even harder. This work proposes a novel post-OCR approach based on a contextual language model and neural machine translation, aiming to improve the quality of OCRed text by detecting and rectifying erroneous tokens. This new technique obtains results comparable to the best-performing approaches on English datasets of the competition on post-OCR text correction in ICDAR 2017/2019.
Files
JCDL2020_shortpaper_Neural Machine Translation with BERT for Post-OCR Error Detection and Correction.pdf
Files
(382.6 kB)
Name | Size | Download all |
---|---|---|
md5:f74ab21f4d65ca69b3f91b2fdc20454f
|
382.6 kB | Preview Download |