Journal article Open Access

An Optical Character Recognition Software Benchmark for Old Dutch Texts on the EYRA Platform

Cuper, Mirjam; Mendrik, Adriënne M.; van Meersbergen, Maarten; Klaver, Tom; Pawar, Pushpanjali; Langedijk, Annette; Wilms, Lotte

Digitized collections of printed historical texts are important for research in Digital Humanities.
However, acquiring high-quality machine readable texts using currently available Optical Character
Recognition (OCR) methods is a challenge. OCR Quality is affected by old fonts, old printing
techniques, bleedthrough of the ink, paper quality, old spelling, multiple columns and so on. It is
unclear which OCR methods perform best. Therefore, we are currently in the process of setting up a
benchmark to enable the evaluation of the performance of OCR software on old Dutch texts. The
benchmark is being set-up on the EYRA benchmark platform (eyrabenchmark.net) developed by The
Netherlands eScience Center and SURF.

Files (114.1 kB)
Name Size
DHBenelux2020_abstract.pdf
md5:41833a7d5614fde680472bc5a91eccab
114.1 kB Download
63
52
views
downloads
All versions This version
Views 6363
Downloads 5252
Data volume 5.9 MB5.9 MB
Unique views 4949
Unique downloads 4545

Share

Cite as