00000nmm##2200000uu#4500 4742068 doi 10.5281/zenodo.4742068 oai:zenodo.org:4742068 Julius Coburger TU Berlin Clemens Neudecker (orcid)0000-0001-5293-8322 Staatsbibliothek zu Berlin - Preußischer Kulturbesitz Anne Baillot (orcid)0000-0002-4593-059X Le Mans Université Data set of the paper "Publishing an OCR ground truth data set for reuse in an unclear copyright setting" David Lassner (orcid)0000-0001-9013-0834 TU Berlin info:eu-repo/semantics/openAccess Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 spdx OCR ground-truth <p>The data set consists of a METS file for each of the PDFs that were used for transcription and a directory data/page_xml that contains the transcriptions of the ground truth in PAGE-XML format. In parallel to the data set publication, a data paper will be published that contains a detailed description of the data set. As soon as it is published, we will link to it. The corresponding source code can be found here https://github.com/millawell/ocr-data/tree/1.1</p> Zenodo 2021-05-07 info:eu-repo/semantics/other 20210512104546.0 300004 md5:99a25e5a8cc8942e571cd908dfc61927 https://zenodo.org/records/4742068/files/2021-05-7_v1.1_ocr-data.tgz open 10.5281/zenodo.4742067 isVersionOf doi