Published November 12, 2025 | Version v1
Dataset Open

DARE Database

  • 1. ROR icon University of Southern Denmark
  • 2. University of Bristol

Description

The DARE Database is a set of handwritten character dates derived from different historical sources from Sweden and Denmark. Additional details are available on our GitHub and on arXiv.

There are seven splits provided in this dataset representing the different data sources. Each folder contains the respective minipics and their labels split into test and training files. The number of files and tokens are:

Train images: 2,876,752
Test images: 152,414
Total number of images: 3,029,166
Total number of tokens: 9,682,027

Which is further explained in the following table:

Datasets Sequence Training Observations Test Observations
Death Certificates (1) DD-MM-YYYY 11,627 1,000
Death Certificates (2) DD-MM-YYYY 155,439 8,338
Police Records (1) DD-MM-YY 1,006,199 53,488
Police Records (2) DD-MM-YY 326,478 17,103
Swedish Records Birth Dates DD-MM-YY 597,756 31,389
Swedish Records Death Dates DD-MM 547,813 28,803
Funeral Records DD-MM 231,440 12,293

Note that for data restriction reasons, the CIHVR images are excluded (as we do not have permission to publicly share those).

The only exception to our images consisting purely of digits arise from the month in the date sequences which sometimes is written with alphabetic characters, e.g., "February" or "Feb". The original images are acquired from Copenhagen Archives, the National Archives of Denmark, and Lund University. The minipics are created using Coherent Point Drift to extract the regions of interest from the source documents.

One comment about the Swedish cause of death records is that a lot of these are labelled as either empty or partly empty. Partly empty, e.g., ' 29-" ' represents that the cell with respect to the month is in fact not empty but rather that the month is the same as above. It is quite common in many historical tabulated records that they use a special mark for notating the same as above. The other cells labelled as ' ,-,-, ' for birth dates or ' ,-, ' for death dates are completely empty cells and could be excluded for pure digit recognition models. However, for transcribing historical records, empty cells are frequently represented and should be taken into account one way or another.

Note: If you want to download a small sample to see how the DARE Database is structured, visit our DARE sample Zenodo page.

Files

DARE.zip

Files (39.2 GB)

Name Size Download all
md5:976408a621d716f08cb54b9624953ef2
39.2 GB Preview Download