Published May 14, 2025 | Version 1
Dataset Open

[Dataset] A dataset for information extraction from 19th-century French Land Registry tables

  • 1. Université Gustave Eiffel
  • 2. ROR icon Laboratoire en Sciences et Technologies de l'Information Géographique pour la ville intelligente et les territoires durables
  • 3. ROR icon Institut national de l'information géographique et forestière
  • 4. EPITA

Description

This dataset has been used to fine-tune and evaluate a DAN model (Document Attention Network) to perform information extraction from 19th century land registry documents (initial registers, états de sections en français). 

Training, evaluation and test subsets have already been created. Images have been digitized by the French Archives of Val-de-Marne departement. They are grouped by town, wich means that images from one town (aka from one same register) can't be in many subsets.

Images description

Additional documentation to come.

Columns (entities in atr-DAN)

  • ancien_numero_parcelle : former plot number (given in only one table type on three)
  • ancienne_nature : former plot nature (given in only one table type on three)
  • identite : taxpayer indentity
  • lieu-dit : plot address
  • nature : plot nature
  • numero_parcelle : plot number
  • numero_proprietaire : taxpayer id in the next register

Additionnal tokens

Text includes some special tokens that are used to represent additionnal layout informations like :

  • → : back to a new line

  • ↑TEXT↓ : exponent

  • ×TEXT± : crossed out text (most of the time, means outdated or erroneous information)

Notes

  • This dataset has been formated using the atr-dan Python library. It means that you can skip the dataset generation step if you use the scripts available on the Git-Hub repositoty of the TPDL paper.

Files

land_registry_dan_dataset.zip

Files (736.5 MB)

Name Size Download all
md5:d861a7a1975602b8fa831939a23a141d
736.5 MB Preview Download

Additional details

Additional titles

Translated title (French)
[Dataset] Dataset pour l'extraction d'information dans les tables d'états de sections du cadastre napoléonien

Funding

Agence de l'innovation de défense
Institut national de l'information géographique et forestière