3459116
doi
10.5281/zenodo.3459116
oai:zenodo.org:3459116
user-newseye
user-eu
Doucet, Antoine
L3i Laboratory, University of La Rochelle
Coustaty, Mickael
L3i Laboratory, University of La Rochelle
Moreux, Jean-Philippe
National Library of France
ICDAR 2019 Competition on Post-OCR Text Correction
Rigaud, Christophe
L3i Laboratory, University of La Rochelle
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
OCR errors
OCR
post-OCR text correction
<p>This paper describes the second round of the ICDAR 2019 competition on post-OCR text correction and presents the different methods submitted by the participants. OCR has been an active research field for over the past 30 years but results are still imperfect, especially for historical documents. The purpose of this competition is to compare and evaluate automatic approaches for correcting (denoising) OCR-ed texts. The present challenge consists of two tasks: 1) error detection and 2) error correction. An original dataset of 22M OCR-ed symbols along with an aligned ground truth was provided to the participants with 80% of the dataset dedicated to training and 20% to evaluation. Different sources were aggregated and contain newspapers, historical printed documents as well as manuscripts and shopping receipts, covering 10 European languages (Bulgarian, Czech, Dutch, English, Finish, French, German, Polish, Spanish and Slovak). Five teams submitted results, the error detection scores vary from 41 to 95% and the best error correction improvement is 44%. This competition, which counted 34 registrations, illustrates the strong interest of the community to improve OCR output, which is a key issue to any digitization process involving textual data.</p>
<p><strong>Dataset</strong></p>
<p>In addition to the paper, you may also be interested in <a href="https://zenodo.org/record/3515403">the datasets of the </a><a href="https://zenodo.org/record/3515403">ICDAR 2019 Competition on Post-OCR Text Correction</a>.</p>
Zenodo
2019-09-24
info:eu-repo/semantics/conferencePaper
3459115
user-newseye
user-eu
award_title=NewsEye: A Digital Investigator for Historical Newspapers; award_number=770299; award_identifiers_scheme=url; award_identifiers_identifier=https://cordis.europa.eu/projects/770299; funder_id=00k4n6c32; funder_name=European Commission;
1579541829.539276
226881
md5:5224addcce8ebf49dcb2eb75a3d33ed4
https://zenodo.org/records/3459116/files/ICDAR2019_POCR_report.pdf
public
10.5281/zenodo.3459115
isVersionOf
doi