Dataset Open Access

Data set of the paper "Publishing an OCR ground truth data set for reuse in an unclear copyright setting"

David Lassner; Julius Coburger; Clemens Neudecker; Anne Baillot


Citation Style Language JSON Export

{
  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.4742068", 
  "title": "Data set of the paper \"Publishing an OCR ground truth data set for reuse in an unclear copyright setting\"", 
  "issued": {
    "date-parts": [
      [
        2021, 
        5, 
        7
      ]
    ]
  }, 
  "abstract": "<p>The data set consists of a METS file for each of the PDFs that were used for transcription and a directory data/page_xml that contains the transcriptions of the ground truth in PAGE-XML format. In parallel to the data set publication, a data paper will be published that contains a detailed description of the data set. As soon as it is published, we will link to it. The corresponding source code can be found here&nbsp;https://github.com/millawell/ocr-data/tree/1.1</p>", 
  "author": [
    {
      "family": "David Lassner"
    }, 
    {
      "family": "Julius Coburger"
    }, 
    {
      "family": "Clemens Neudecker"
    }, 
    {
      "family": "Anne Baillot"
    }
  ], 
  "version": "1.1", 
  "type": "dataset", 
  "id": "4742068"
}
44
12
views
downloads
All versions This version
Views 4444
Downloads 1212
Data volume 3.6 MB3.6 MB
Unique views 3737
Unique downloads 99

Share

Cite as