Published May 12, 2022 | Version 1.0.0
Other Open

LECTAUREP Contemporary French Model (Administration)

Creators

  • 1. ALMAnaCH, Inria

Contributors

Contact person:

Data curator:

  • 1. Alix
  • 2. Aurélia
  • 3. Marie-Françoise
  • 4. Nathalie
  • 5. Marc

Description

Description

The model was trained from the ground truth produced by the LECTAUREP Project (Inria & Archives Nationales) between 2019 and 2022. The training dataset contained many handwriting examples taken from French administrative documents produced between 1742 and 1928.

Training and Testing datasets

The data was collected from LECTAUREP's ground truth repositories:
- lectaurep-bronod v0.0.1
- lectaurep-mariages-et-divorces v.1.0
- lectaurep-repertoires v2.0

12 pages were kept aside to create a test set.

The training dataset contained:
- 308 files
- 19 364 lines
- 329 270 characters

The test dataset contained:
- 12 files
- 962 lines
- 15 243 characters

 

Transcription standards

The transcriptions were created with eScriptorium. They respect what is written (abbreviations are not developed, capitalization follows 19th century practices). Superscripted portions of text are signaled by `^` and many signatures are transcription with ¥.

Training

The model was trained using the NFD normalization.

Credits

The model was trained by Alix Chagué using data created by Aurélia Rostaing, Françoise Limon-Bonnet, Nathalie Denis and Marc Durand.

Additional information

- more information on the LECTAUREP Project can be found at https://lectaurep.hypotheses.org/
- more information on the model can be found at https://github.com/lectaurep/lectaurep_base_model

Files

metadata.json

Files (16.1 MB)

Name Size Download all
md5:f6c7f613931dce656a163756eb2b56de
16.1 MB Download
md5:3e37fc80c0dfd089024c8173e9528a99
2.3 kB Preview Download