Published July 14, 2020 | Version 1.0.0
Dataset Open

Finnish Court Records-sub500. A dataset of Finnish notarial records (19th Century)

  • 1. PRHLT
  • 2. National Archives of Filand

Description

This dataset is a selection of 500 pages from the Renovated District Court Records (19th century), one of the largest collections in the National Archives of Finland. The documents consists of records of deeds, mortgages, traditional life-annuity, among others.
This dataset contains images with one or two document pages, and it is annotated at image level using six different
region types along with the baselines and line level transcription (Swedish). This blend of single page and double page images is a common complexity found in historical documents.

Layout labels are:
1. page-number: the page number, commonly placed on the top-right corner of the image,  
2. paragraph: a paragraph placed on a single page image or on the left side of a double page image.
3. paragraph_2: a paragraph placed on the right side of a double page image.
4. marginalia: any annotation on the margin of the document,
5. table: a table placed on a single page image or on the left side of a double page image.
6. table_2: a table placed on the right side of a double page image.

The images along with their respective ground-truth was compiled in PAGE compliant XML format by the National Archives of Finland and the HTR group of the Pattern Recognition and Human Language Technologies Research Center.

Files

Files (1.2 GB)

Name Size Download all
md5:461dcba83e99e3fb665ab8dd68e8cc47
1.2 GB Download