Planned intervention: On Wednesday June 26th 05:30 UTC Zenodo will be unavailable for 10-20 minutes to perform a storage cluster upgrade.
Published May 21, 2024 | Version v1.4.8
Software Open

gt_structure_text

Description

The OCR-D Ground Truth text and structure corpus was created between 2015 -2017. In the years since 2017, this corpus has been further curated and supplemented with metadata where appropriate. The corpus includes page XML files within annotations of the text and structure include. The data is based on transcription data stored in the German Text Archive (DTA) (https://www.deutschestextarchiv.de/).

Notes

If you use this dataset, please cite it using the metadata from this file.

Files

OCR-D/gt_structure_text-v1.4.8.zip

Files (1.7 GB)

Name Size Download all
md5:fbb15e36b2e628ad93765e076e17e7af
1.7 GB Preview Download

Additional details

Related works