Published May 14, 2025
| Version 1
Dataset
Open
[Dataset] A dataset for information extraction from 19th-century French Land Registry tables
Authors/Creators
Description
This dataset has been used to fine-tune and evaluate a DAN model (Document Attention Network) to perform information extraction from 19th century land registry documents (initial registers, états de sections en français).
Training, evaluation and test subsets have already been created. Images have been digitized by the French Archives of Val-de-Marne departement. They are grouped by town, wich means that images from one town (aka from one same register) can't be in many subsets.
Images description
Additional documentation to come.
Columns (entities in atr-DAN)
- ancien_numero_parcelle : former plot number (given in only one table type on three)
- ancienne_nature : former plot nature (given in only one table type on three)
- identite : taxpayer indentity
- lieu-dit : plot address
- nature : plot nature
- numero_parcelle : plot number
- numero_proprietaire : taxpayer id in the next register
Additionnal tokens
Text includes some special tokens that are used to represent additionnal layout informations like :
-
→ : back to a new line
-
↑TEXT↓ : exponent
- ×TEXT± : crossed out text (most of the time, means outdated or erroneous information)
Notes
- This dataset has been formated using the atr-dan Python library. It means that you can skip the dataset generation step if you use the scripts available on the Git-Hub repositoty of the TPDL paper.
Files
land_registry_dan_dataset.zip
Files
(736.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:d861a7a1975602b8fa831939a23a141d
|
736.5 MB | Preview Download |
Additional details
Additional titles
- Translated title (French)
- [Dataset] Dataset pour l'extraction d'information dans les tables d'états de sections du cadastre napoléonien
Funding
- Agence de l'innovation de défense
- Institut national de l'information géographique et forestière