Dataset Open Access
David Lassner;
Julius Coburger;
Clemens Neudecker;
Anne Baillot
The data set consists of a METS file for each of the PDFs that were used for transcription and a directory data/page_xml that contains the transcriptions of the ground truth in PAGE-XML format. In parallel to the data set publication, a data paper will be published that contains a detailed description of the data set. As soon as it is published, we will link to it. The corresponding source code can be found here https://github.com/millawell/ocr-data/tree/1.1
Name | Size | |
---|---|---|
2021-05-7_v1.1_ocr-data.tgz
md5:99a25e5a8cc8942e571cd908dfc61927 |
300.0 kB | Download |
All versions | This version | |
---|---|---|
Views | 90 | 90 |
Downloads | 18 | 18 |
Data volume | 5.4 MB | 5.4 MB |
Unique views | 83 | 83 |
Unique downloads | 15 | 15 |