Published May 21, 2024
| Version v1.4.8
Software
Open
gt_structure_text
Creators
Description
The OCR-D Ground Truth text and structure corpus was created between 2015 -2017. In the years since 2017, this corpus has been further curated and supplemented with metadata where appropriate. The corpus includes page XML files within annotations of the text and structure include. The data is based on transcription data stored in the German Text Archive (DTA) (https://www.deutschestextarchiv.de/).
Notes
Files
OCR-D/gt_structure_text-v1.4.8.zip
Files
(1.7 GB)
Name | Size | Download all |
---|---|---|
md5:fbb15e36b2e628ad93765e076e17e7af
|
1.7 GB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/OCR-D/gt_structure_text/tree/v1.4.8 (URL)