Published February 6, 2026
| Version v1
Dataset
Open
Graphemic HTR ground truth dataset for an Early New High German transcription model (15th centruy)
Authors/Creators
Description
This repository contains a set of training data for ATR models (Kraken). It contains 50 pages of ground truth as image files (
jpg) and transcription files (PAGE xml).The ground truth contains 50 pages including 2,177 lines with 18,626 word tokens and 113,491 characters.
Please refer to the
README.md file for further information.A ground truth dataset following a graphemic transcription of the same data conntained within this repository may be found here: Diplomatic HTR-Ground Truth dataset .
Authors
The data in this repository was prepared and curated by Adam Juszczak (ORCiD: 0009-0000-5330-6183) and Frederik Skidzun (ORCiD: 0009-0002-7712-4207) of the Regesta Imperii - Regesta of Emperor Frederik III.
License
This dataset is made available under the CC-BY 4.0 license.
Files
graphemic-gt.zip
Files
(153.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:09010d99143fd59179980204f924cfa3
|
649.0 kB | Preview Download |
|
md5:4de34aedf731af66788a7584d2491668
|
152.6 MB | Preview Download |
|
md5:022586c6f84dd4805a5b99205fe933c3
|
19.1 kB | Preview Download |
|
md5:6eec5fef0980fa681e31264d79fd187d
|
3.7 kB | Preview Download |
|
md5:107a404a4aae75b0d77ef1acb4f68aea
|
4.7 kB | Preview Download |
Additional details
Related works
- Is version of
- Dataset: 10.5281/zenodo.18377766 (DOI)