CATMuS Gothic Print
Contributors
Data collectors:
Researcher:
Description
CATMuS-Gothic-Print - OCR model for prints in Gothic typefaces and in 16th century French.
CATMuS Gothic Print is a Kraken OCR model fine-tuned on CATMuS Medieval model with data produced by the SETAF project. SETAF data are prints in Gothic typefaces and in 16th century French language.
Transcriptions follow graphematic principles and try to be as compatible as possible with guidelines previously published for French: no ligatures (except those that still exist), no allographetic variants, abbreviations are not resolved.
The model is trained with NFD Unicode normalization: each diacritic (including superscripts) are transcribed as their own characters, separately from the "main" character.
All source datasets and papers are referenced in the related works section, all transcribers are mentioned in the collaborators section and partner-project members are mentioned as authors.
Fundings
- Projet SETAF, University of Geneva (IHR), FNS 205056
Files
metadata-catmus-gothic-print-1.0.0.json
Files
(22.9 MB)
Name | Size | Download all |
---|---|---|
md5:a82371504c47403dc653495716eb149f
|
22.9 MB | Download |
md5:5665a988d8c1879dc0c0c78e9690c6b0
|
2.9 kB | Preview Download |
Additional details
Related works
- Is derived from
- Dataset: https://github.com/SETAFDH/HTR-SETAF-Pierre-de-Vingle (URL)
- Dataset: https://github.com/SETAFDH/HTR-SETAF-Jean-Michel (URL)
- Dataset: https://github.com/SETAFDH/HTR-SETAF-LesFaictzJCH (URL)
- Is documented by
- Working paper: https://hal.science/hal-04281804 (URL)
- Conference proceeding: https://hal.science/hal-04555002 (URL)
Dates
- Available
-
2024-01-31
References
- Sonia Solfrini, et al. Océriser les imprimés du XVIe siècle en langue française : le cas d'un corpus romand en caractères gothiques. Humanistica 2024, Association francophone des humanités numériques, mai 2024, Meknès, Maroc. ⟨hal-04555002⟩.