Published January 31, 2024 | Version 1.0.0
Model Open

CATMuS Gothic Print

  • 1. ROR icon University of Geneva

Description

CATMuS-Gothic-Print - OCR model for prints in Gothic typefaces and in 16th century French.

CATMuS Gothic Print is a Kraken OCR model fine-tuned on CATMuS Medieval model with data produced by the SETAF project. SETAF data are prints in Gothic typefaces and in 16th century French language.

Transcriptions follow graphematic principles and try to be as compatible as possible with guidelines previously published for French: no ligatures (except those that still exist), no allographetic variants, abbreviations are not resolved. 

The model is trained with NFD Unicode normalization: each diacritic (including superscripts) are transcribed as their own characters, separately from the "main" character.

All source datasets and papers are referenced in the related works section, all transcribers are mentioned in the collaborators section and partner-project members are mentioned as authors.

Fundings

  • Projet SETAF, University of Geneva (IHR), FNS 205056

Files

metadata-catmus-gothic-print-1.0.0.json

Files (22.9 MB)

Name Size Download all
md5:a82371504c47403dc653495716eb149f
22.9 MB Download
md5:5665a988d8c1879dc0c0c78e9690c6b0
2.9 kB Preview Download

Additional details

Dates

Available
2024-01-31

References

  • Sonia Solfrini, et al. Océriser les imprimés du XVIe siècle en langue française : le cas d'un corpus romand en caractères gothiques. Humanistica 2024, Association francophone des humanités numériques, mai 2024, Meknès, Maroc. ⟨hal-04555002⟩.