Published February 6, 2026
| Version v1
Dataset
Open
Diplomatic HTR ground truth dataset for an Early New High German transcription model (15th century)
Authors/Creators
Description
This repository contains a set of training data for ATR models (Kraken). It contains 50 pages of ground truth as image files (
jpg) and transcription files (PAGE xml).The ground truth contains 50 pages including 2,177 lines with 18,626 word tokens and 112,790 characters.
Please refer to the
README.md file for further information.A ground truth dataset following a graphemic transcription of the same data conntained within this repository may be found here: Graphemic HTR-Ground Truth dataset .
Authors
The data in this repository was prepared and curated by Adam Juszczak (ORCiD: 0009-0000-5330-6183) and Frederik Skidzun (ORCiD: 0009-0002-7712-4207) of the Regesta Imperii - Regesta of Emperor Frederik III.
License
This dataset is made available under the CC-BY 4.0 license.
Files
diplomatic-gt.zip
Files
(153.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:8f7c58b9fd690f949ed9efd6c829ae7f
|
656.1 kB | Preview Download |
|
md5:4de34aedf731af66788a7584d2491668
|
152.6 MB | Preview Download |
|
md5:022586c6f84dd4805a5b99205fe933c3
|
19.1 kB | Preview Download |
|
md5:0397cb6f2e2d177e519fd53a820bb33b
|
3.9 kB | Preview Download |
|
md5:8fafe394c1c12b1a936e770de247c0c1
|
5.6 kB | Preview Download |
Additional details
Related works
- Is variant form of
- Dataset: 10.5281/zenodo.18441031 (DOI)