Published January 31, 2024 | Version 1.0.0
Model Open

CATMuS Gothic Print

  • 1. ROR icon University of Geneva

Description

CATMuS Gothic Print - OCR model for prints in Gothic typefaces and in 16th century French.

CATMuS Gothic Print is a Kraken OCR model fine-tuned on CATMuS Medieval model with data produced by the SETAF project. SETAF data are prints in Gothic typefaces and in 16th century French language.

Transcriptions follow graphematic principles and try to be as compatible as possible with guidelines previously published for French: no ligatures (except those that still exist), no allographetic variants, abbreviations are not resolved. 

The model is trained with NFD Unicode normalization: each diacritic (including superscripts) are transcribed as their own characters, separately from the "main" character.

All source datasets and papers are referenced in the related works section, all transcribers are mentioned in the collaborators section and partner-project members are mentioned as authors.

Fundings

Files

metadata-catmus-gothic-print-1.0.0.json

Files (22.9 MB)

Name Size Download all
md5:a82371504c47403dc653495716eb149f
22.9 MB Download
md5:5665a988d8c1879dc0c0c78e9690c6b0
2.9 kB Preview Download

Additional details

Funding

Swiss National Science Foundation
S’en tenir aux 'Faits de Jésus Christ et du pape'. Les imprimés évangéliques romands et les pratiques de communication religieuse à l’époque de la Réforme (SETAF) 205056

Dates

Available
2024-01-31

References

  • Sonia Solfrini, Simon Gabay, Maxime Humeau, Ariane Pinche, Pierre-Olivier Beaulnes, Aurélia Marques Oliveira, Geneviève Gross, Daniela Solfaroli Camillocci. « Océriser les imprimés du XVIe siècle en langue française : le cas d'un corpus romand en caractères gothiques », Humanistica 2024, Association francophone des humanités numériques, mai 2024, Meknès, Maroc. ⟨hal-04555002⟩.