There is a newer version of the record available.

Published September 12, 2023 | Version 1.2
Dataset Open

PARES : PArish REgistry Survey − Historical Census Table Dataset (19th, 20th centuries) − France

  • 1. Laboratoire L3i, Université de La Rochelle
  • 2. TEKLIA

Description

The dataset contains 250 images of handwritten census tables for years ranging from around 1650 A.D. until 1850 A.D. They come from two different French cities, Vic-sur-Seille (French department of Moselle) and Echevronne (French department of Côte d’Or). While they mention very ancient times, the documents are handwritten transcriptions of even older documents and are quite recent, copied from original documents during the 20th century. The copies were made by only a few different writers.

In terms of damages and degradations, we identify seven different document categories. C1 and C3 are pretty qualitative documents, without serious damage, for almost 90% of the documents. Other categories include highly damaged documents or documents with specificities.

A notable aspect of this dataset is that the records are written using only two different physical paper templates. Categories n°1, 2, 3, 6 and 7 have 25 recordings while the categories 4 and 5 are higher and can record up to 35 recordings. C4 and C5 are the bigger ones and differ from the rest of the documents. They represent less than 8% of the corpus, which is hence homogeneous.

# Changelog

  • 2023-05-18 / Version 1: Initial release of the dataset
  • 2023-09-11 : Connected components in labels were eroded and the coordinates in labels_json do not match the coordinates of the connected components in labels.
  • 2023-09-12 : Some elements exported in the labels directory contained noise, produced by another tool when exporting the annotations from the platform. Images are clean now and noise was removed from the images.

 

Files

2023_03_20_first_manual_split_distribution_between_document_categories.csv

Additional details

Related works