Published May 12, 2023 | Version 1.0
Dataset Open

Republic Print Dataset

  • 1. Digital Infrastructure Humanities Cluster KNAW

Description

The republic print dataset consists of 107 ground truthed scans 

Using annotation software provided through the Transkribus Platform we annotated scans, concerning mostly 18th century printed documents from the National Archive of the Netherlands, with their layout consisting of baselines and regions. The resulting ground truth was used to train a machine learning model yielding very accurate results. The ground truth was made available as an open access dataset.

Files

republicprint.zip

Files (689.4 MB)

Name Size Download all
md5:7201bd41d02cf776ed17bd00a5633065
689.4 MB Preview Download

Additional details

Funding

REPUBLIC 32205
Dutch Research Council