Published January 6, 2025 | Version 1.0.0
Dataset Open

OCR Groundtruth for Swinemünder Badeanzeiger

  • 1. EDMO icon Hochschule Wismar, University of Applied Sciences, Technology, Business and Design

Contributors

Description

This dataset contains the ground truth annotation for extracting and structuring information from the old newspaper "Swinemünder Badeanzeiger" tables. The newspaper was obtained from Digitale Bibliothek Mecklenburg Vorpommern https://www.digitale-bibliothek-mv.de/viewer/toc/PPN636776093/

The data was obtained by selecting one "Swinemünder Badeanzeiger" image per year and manually transcribing the content. The dataset is structured based on the newspaper's publication year. One folder for each year contains a folder named according to the original image ID and includes the following data

  • table_[running_number].jpg image with the segmented table
  • table_[running_number]_annotation.json data extracted and structured from the segmented image by manual transcription
  • table_[running_number]_index_connected.json list that connected the entry with the corresponding table rows to maintain multi-row entries

For each entry, a JSON entry was created and added to table_[running_number]_annotation.json, which consists of the following fields: 

  • input: Transcription of the original row, including markers for columns
  • Nummer: The sequence number of the row as extracted from the input field
  • Vorname: The first name, if it exists otherwise null
  • Nachname: The last name, if it exists; otherwise null
  • Titel: The (academic) title, if it exists, otherwise null
  • Beruf: The profession, if it exists; otherwise null
  • Sozialer Stand: The social status, if it exists, otherwise null
  • Begleitung: Any companion, such as family members or servants, if exists, otherwise null
  • Wohnort: The city, where the person(s) arrived from, if it exists, otherwise null
  • Wohnung: The local residence, such as a hotel, pension, or vacation home, if it exists, otherwise null
  • Personenanzahl: The overall number of persons that are represented by this entry

In addition to the separate annotation files, the file swinebad_groundtruth.json has a complete list of all entries to facilitate more straightforward data analysis. To this end, each entry was completed with the following data.

  • date: The publication date of the newspaper where the entry was published

The following example lists an entry which was obtained from the fourth line of the table as published at https://www.digitale-bibliothek-mv.de/viewer/image/PPN636776093_1910/1/LOG_0003/

    {
        "input": "973 | Dr. Auerbach, Richard, Journalist, mit Frau | „ | Villa Kaiser Wilhelm | 2",
        "Nummer": "973",
        "Vorname": "Richard",
        "Nachname": "Auerbach",
        "Titel": "Dr.",
        "Beruf": "Journalist",
        "Sozialer Stand": null,
        "Begleitung": "mit Frau",
        "Wohnort": "Berlin",
        "Wohnung": "Villa Kaiser Wilhelm",
        "Personenanzahl": "2",
        "date": "1910-06-06"
    },

You are welcome to cite the 'Digitale Bibliothek MV / Universität Greifswald' (+ URN for digital publications or the shelfmark for printed publications) as the source for the images.

Files

SwineBad_Annotation.zip

Files (34.4 MB)

Name Size Download all
md5:9954755c26c27607e7bc5678e8897afd
34.4 MB Preview Download

Additional details

Funding

Deutsche Forschungsgemeinschaft
Text+ 460033370