CATMuS Medieval
Creators
- Pinche, Ariane (Supervisor)1
- Clérice, Thibault (Project manager)2
- Chagué, Alix (Project manager)2
- Camps, Jean-Baptiste (Supervisor)3
- Vlachou-Efstathiou, Malamatenia (Data collector)4
- Gille Levenson, Matthias (Supervisor)1, 5
- Brisville-Fertin, Olivier (Supervisor)1, 6
- Boschetti, Federico (Supervisor)7
- Fischer, Franz (Supervisor)8
- Gervers, Michael (Supervisor)9
- Boutreux, Agnès (Data collector)9
- Manton, Avery (Supervisor)9
- Gabay, Simon (Supervisor)10
- 1. Histoire, Archéologie, Littératures des Mondes Chrétiens et Musulmans Médiévaux
- 2. French Institute for Research in Computer Science and Automation
- 3. École Nationale des Chartes
- 4. Institut de Recherche et d'Histoire des Textes
- 5. Centre Jean Mabillon
- 6. École Normale Supérieure de Lyon
- 7. Institute for Computational Linguistics “A. Zampolli”
- 8. Ca' Foscari University of Venice
- 9. University of Toronto
- 10. University of Geneva
Contributors
Data collectors:
- Bordier, Julie
- Glaise, Anthony
- Alba, Rachele1
- Rubin, Giorgia1
- White, Nick2
- Karaisl, Antonia2
- Leroy, Noé3
- Maulu, Marco4
- Biay, Sébastien
- Cappe, Zoé5
- Konstantinova, Kristina5
- Boby, Victor5
- Christensen, Kelly6
- Pierreville, Corinne7, 8
- Aruta, Davide9
- Lenzi, Martina3
- Le Huëron, Armelle7
- Mariotti, Violetta
- Nolibois, Alice
- Deleville, Prunelle11
- Carnaille, Camille11
- Lecomte, Sophie12
- Meylan, Aminoel11
- Ventura, Simone12
- Dugaz, Lucien5
- 1. Ca' Foscari University of Venice
- 2. rescribe.xyz
- 3. Centre Jean Mabillon
- 4. Università degli Studi di Sassari
- 5. École Nationale des Chartes
- 6. French Institute for Research in Computer Science and Automation
- 7. Histoire, Archéologie, Littératures des Mondes Chrétiens et Musulmans Médiévaux
- 8. Jean Moulin University Lyon 3
- 9. Lumière University Lyon 2
- 10. Université de Genève
- 11. University of Geneva
- 12. Université Libre de Bruxelles
Description
CATMuS (Consistent Approach to Transcribing ManuScript) Medieval is a Kraken HTR model trained on four different languages (in descending order of importance in the dataset: Old and Middle French, Latin, Spanish (and other languages of Spain), Italian) on strictly graphematic transcriptions. No abbreviations are resolved.
This model is the result of the collaboration from researchers from CREMMA, GalliCorpora, HTRomance and DEEDS projects. It follows the CREMMA Guidelines (Supplemented by the CREMMA Medii Aevi) and will be consolidated under the CATMuS Medieval Guidelines in an upcoming paper.
The model is trained with NFD Unicode normalization: each diacritic (including superscripts) are transcribed as their own characters, separately from the "main" character.
Metrics
- 3,361,410 characters
- 113,228 lines
- 1602 files (indifferently double pages or single pages)
- 7560 regions
All source datasets and papers are referenced in the related works section, all transcribers are mentioned in the collaborators section, all partner-project members are mentioned as authors.
Fundings
- CREMMA, DIM MAP, Région Île-de-France
- CremmaLab, DIM MAP, Région Île-de-France
- GalliCorpora, Datalab, Bibliothèque nationale de France
- HTRomance, Datalab, Bibliothèque nationale de France
- Text as Image, Image as Text: Charter integrity and topic modelling, SSHRCC 1350911
- Les Décades de Bersuire, première traduction française de l'Histoire romaine de Tite-Live – LiBer, ANR 21-CE27-0008
- Projet Fabliaux, Biblissima+, ANR 21-ESRE-0005
Files
metadata.json
Files
(22.9 MB)
Name | Size | Download all |
---|---|---|
md5:11f45c4d63038bd5fd932e5df6c3ae7e
|
22.9 MB | Download |
md5:582c3ff89f880cc0c03de8fecb7ebaac
|
3.4 kB | Preview Download |
Additional details
Dates
- Created
-
2023-11-01
References
- Ariane Pinche. Guide de transcription pour les manuscrits du Xe au XVe siècle. 2022. ⟨hal-03697382⟩
- Thibault Clérice, Malamatenia Vlachou-Efstathiou, Alix Chagué. CREMMA Medii Aevi: Literary manuscript text recognition in Latin. Journal of Open Humanities Data, 2023, 9, pp.4. ⟨10.5334/johd.97⟩. ⟨hal-03828353v5⟩