LECTAUREP Contemporary French Model (Administration)
Contributors
Contact person:
Data collectors:
Data curator:
- 1. Alix
- 2. Aurélia
- 3. Marie-Françoise
- 4. Nathalie
- 5. Marc
Description
Description
The model was trained from the ground truth produced by the LECTAUREP Project (Inria & Archives Nationales) between 2019 and 2022. The training dataset contained many handwriting examples taken from French administrative documents produced between 1742 and 1928.
Training and Testing datasets
The data was collected from LECTAUREP's ground truth repositories:
- lectaurep-bronod v0.0.1
- lectaurep-mariages-et-divorces v.1.0
- lectaurep-repertoires v2.0
12 pages were kept aside to create a test set.
The training dataset contained:
- 308 files
- 19 364 lines
- 329 270 characters
The test dataset contained:
- 12 files
- 962 lines
- 15 243 characters
Transcription standards
The transcriptions were created with eScriptorium. They respect what is written (abbreviations are not developed, capitalization follows 19th century practices). Superscripted portions of text are signaled by `^` and many signatures are transcription with ¥.
Training
The model was trained using the NFD normalization.
Credits
The model was trained by Alix Chagué using data created by Aurélia Rostaing, Françoise Limon-Bonnet, Nathalie Denis and Marc Durand.
Additional information
- more information on the LECTAUREP Project can be found at https://lectaurep.hypotheses.org/
- more information on the model can be found at https://github.com/lectaurep/lectaurep_base_model
Files
metadata.json
Files
(16.1 MB)
Name | Size | Download all |
---|---|---|
md5:f6c7f613931dce656a163756eb2b56de
|
16.1 MB | Download |
md5:3e37fc80c0dfd089024c8173e9528a99
|
2.3 kB | Preview Download |