Published September 19, 2018 | Version v1
Dataset Open

READ ABP WI Dataset - Writer Identification over decades

  • 1. TU Wien
  • 2. Bistum Passau

Description

A hand is usually considered as a unique characteristic of a person. However, it may slightly change over their whole lifespan. This change might be due to some physical or mental issues. To the best of our knowledge, there is no dataset available, which covers this aspect of evolvement of handwriting of a single person.

When dealing with archival documents, it is important to show that methods are invariant against these changes or investigate how much of these changes are covered. Thus, a new dataset was created with data of the Passau Diocesan Archives (ABP, https://www.bistum-passau.de/bistum/archiv ).

The documents originate from death records of different villages or towns in the Diocese of Passau. Usually the writer of these records (mostly the priest) remains the same over several years. In total, the dataset consists of 1766 pages, which originate from 28 different writers. The number of pages per writer varies from 7 up to 311. For some writers, we only have data from 3 different years, whereas the largest time span between two documents of the same writer is 31 years.

The dataset is organized as follows:

[ID]_[Name]\[YEAR]\[ID]_filename.png

The corresponding PAGE XML file is provided along with the dataset and contains the regions of the image where text is included. This file can be used to calculate features of the writer solely on the handwriting and not on the table lines.

Currently no research tasks are defined on the dataset; we leave this up to the community. Drop us a note how you are using this dataset.

Notes

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 674943

Files

ABP-WI-dataset.zip

Files (4.7 GB)

Name Size Download all
md5:5022a1eb0db0500b99cb8895d2d3bc88
4.7 GB Preview Download

Additional details

Funding

READ – Recognition and Enrichment of Archival Documents 674943
European Commission