CATMuS-Print [Large]

Gabay, Simon; Clérice, Thibault

doi:10.5281/zenodo.10592716

Published January 30, 2024 | Version 2024-01-30

Model Open

CATMuS-Print [Large]

1. University of Geneva
2. Institut national de recherche en informatique et en automatique

Contributors

Data collector (13):

Researcher:

Chagué, Alix³

1. University of Geneva
2. Université de Strasbourg
3. Institut national de recherche en informatique et en automatique

CATMuS-Print (Large) - Diachronic model for French prints and other West European languages

CATMuS (Consistent Approach to Transcribing ManuScript) Print is a Kraken HTR model trained on data produced by several projects, dealing with different languages (French, Spanish, German, English, Corsican, Catalan, Latin, Italian…) and different centuries (from the first prints of the 16th c. to digital documents of the 21st century).

Transcriptions follow graphematic principles and try to be as compatible as possible with guidelines previously published for French: no ligature (except those that still exist), no allographetic variants (except the long s), and preservation of the historical use of some letters (u/v, i/j). Abbreviations are not resolved. Inconsistencies might be present, because transcriptions have been done over several years and the norms have slightly evolved.

The model is trained with NFKD Unicode normalization: each diacritic (including superscripts) are transcribed as their own characters, separately from the "main" character.

This model is the result of the collaboration from researchers from the University of Geneva and Inria Paris and will be consolidated under the CATMuS Medieval Guidelines in an upcoming paper.

Files

metadata.json

Files (22.9 MB)

Name	Size	Download all
catmus-print-fondue-large.mlmodel md5:9ed1ed4a6c34e1f4b292b380bf9c5543	22.9 MB	Download
metadata.json md5:2332232be0b1d0cfb714ef4f13764345	2.7 kB	Preview Download

Additional details

Is documented by: Journal article: https://hal.science/hal-02577236 (URL)

Available: 2024-01-30

	All versions	This version
Views	2,598	2,598
Downloads	4,921	4,921
Data volume	33.8 GB	33.8 GB

Contributors

Data collector (13):

Researcher:

metadata.json

Files (22.9 MB)

Related works

Dates

CATMuS-Print [Large]

Authors/Creators

Contributors

Data collector (13):

Researcher:

Description

Files

metadata.json

Files (22.9 MB)

Additional details

Related works

Dates