An Optical Character Recognition Software Benchmark for Old Dutch Texts on the EYRA Platform

doi:10.5281/zenodo.3872918

DHBenelux 2020

Published June 1, 2020 | Version v1

Journal article Open

An Optical Character Recognition Software Benchmark for Old Dutch Texts on the EYRA Platform

Digitized collections of printed historical texts are important for research in Digital Humanities.
However, acquiring high-quality machine readable texts using currently available Optical Character
Recognition (OCR) methods is a challenge. OCR Quality is affected by old fonts, old printing
techniques, bleedthrough of the ink, paper quality, old spelling, multiple columns and so on. It is
unclear which OCR methods perform best. Therefore, we are currently in the process of setting up a
benchmark to enable the evaluation of the performance of OCR software on old Dutch texts. The
benchmark is being set-up on the EYRA benchmark platform (eyrabenchmark.net) developed by The
Netherlands eScience Center and SURF.

Files

DHBenelux2020_abstract.pdf

Files (114.1 kB)

Name	Size	Download all
DHBenelux2020_abstract.pdf md5:41833a7d5614fde680472bc5a91eccab	114.1 kB	Preview Download

103

Views

Downloads

Show more details

	All versions	This version
Views	103	102
Downloads	71	70
Data volume	9.1 MB	9.0 MB

More info on how stats are collected....

DOI

Resource type

Journal article

Publisher

Zenodo

Conference

DH Benelux 2020 #GoesOnline , World Wide Web, 3rd - 5th June 2020

Languages

English

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: June 2, 2020
Modified: June 2, 2020

An Optical Character Recognition Software Benchmark for Old Dutch Texts on the EYRA Platform

Creators

Description

Files

DHBenelux2020_abstract.pdf

Files (114.1 kB)