CATMuS Gothic Print
Contributors
Data collectors:
Researcher:
Description
CATMuS Gothic Print - OCR model for prints in Gothic typefaces and in 16th century French.
CATMuS Gothic Print is a Kraken OCR model fine-tuned on CATMuS Medieval model with data produced by the SETAF project. SETAF data are prints in Gothic typefaces and in 16th century French language.
Transcriptions follow graphematic principles and try to be as compatible as possible with guidelines previously published for French: no ligatures (except those that still exist), no allographetic variants, abbreviations are not resolved.
The model is trained with NFD Unicode normalization: each diacritic (including superscripts) are transcribed as their own characters, separately from the "main" character.
All source datasets and papers are referenced in the related works section, all transcribers are mentioned in the collaborators section and partner-project members are mentioned as authors.
Fundings
- Projet SETAF, University of Geneva | IHR, FNS n° 205056.
Files
metadata-catmus-gothic-print-1.0.0.json
Files
(22.9 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:a82371504c47403dc653495716eb149f
|
22.9 MB | Download |
|
md5:5665a988d8c1879dc0c0c78e9690c6b0
|
2.9 kB | Preview Download |
Additional details
Related works
- Is derived from
- Dataset: https://github.com/SETAFDH/HTR-SETAF-Pierre-de-Vingle (URL)
- Dataset: https://github.com/SETAFDH/HTR-SETAF-Jean-Michel (URL)
- Dataset: https://github.com/SETAFDH/HTR-SETAF-LesFaictzJCH (URL)
- Is documented by
- Working paper: https://hal.science/hal-04281804 (URL)
- Conference proceeding: https://hal.science/hal-04555002 (URL)
Funding
- Swiss National Science Foundation
- S’en tenir aux 'Faits de Jésus Christ et du pape'. Les imprimés évangéliques romands et les pratiques de communication religieuse à l’époque de la Réforme (SETAF) 205056
Dates
- Available
-
2024-01-31
References
- Sonia Solfrini, Simon Gabay, Maxime Humeau, Ariane Pinche, Pierre-Olivier Beaulnes, Aurélia Marques Oliveira, Geneviève Gross, Daniela Solfaroli Camillocci. « Océriser les imprimés du XVIe siècle en langue française : le cas d'un corpus romand en caractères gothiques », Humanistica 2024, Association francophone des humanités numériques, mai 2024, Meknès, Maroc. ⟨hal-04555002⟩.