Published January 12, 2026
| Version 1.0
Model
Open
PRIMA HTR
Authors/Creators
Contributors
Annotator (2):
Project leader:
Description
PRIMA HTR Model — Italian Early Modern Manuscripts (late 16th–18th c.)
Description
The PRIMA HTR model was developed within the framework of the ERC project PRIMA — Manuscripts in the Age of Print, hosted at the Centre d’Études Supérieures de la Renaissance (CESR – UMR 7323), Université de Tours, with the support of the LIFAT computer science laboratory (Université de Tours), which provided the high-performance computing infrastructure used for model training.
The model is designed to support large-scale transcription and analysis of Italian manuscript heritage from the late sixteenth to the eighteenth century, with a particular focus on literary, satirical and poetic texts.
We invite scholars and institutions working on early modern Italian manuscripts to use this model and, whenever possible, to publish the resulting transcriptions in open repositories, in order to contribute to the continuous improvement of both the model and the associated training datasets.
Training data and methodology
The model is the result of fine-tuning on a heterogeneous corpus of Italian handwritten sources from the late sixteenth to the eighteenth century, including poetic, satirical, narrative and documentary texts, on top of a base model trained on a wide range of Latin-script handwritten documents. In order to increase the diversity of writing styles, the training corpus also incorporates a selection of early modern printed calligraphy manuals, used to introduce additional stylistic variation in letterforms and writing practices.
Its performance was further optimized through the injection of synthetic training data generated from the manuscript material under study, in order to improve robustness to scribal variation and layout heterogeneity.
The complete data augmentation workflow is documented in the project Gitlab repository: https://scm.univ-tours.fr/cesr/prima/data_augmentation
Source collections
A representative portion of the training corpus is derived from digitized manuscripts preserved in the following institutions (non-exhaustive list):
- Biblioteca Nazionale Centrale di Firenze
Description
The PRIMA HTR model was developed within the framework of the ERC project PRIMA — Manuscripts in the Age of Print, hosted at the Centre d’Études Supérieures de la Renaissance (CESR – UMR 7323), Université de Tours, with the support of the LIFAT computer science laboratory (Université de Tours), which provided the high-performance computing infrastructure used for model training.
The model is designed to support large-scale transcription and analysis of Italian manuscript heritage from the late sixteenth to the eighteenth century, with a particular focus on literary, satirical and poetic texts.
We invite scholars and institutions working on early modern Italian manuscripts to use this model and, whenever possible, to publish the resulting transcriptions in open repositories, in order to contribute to the continuous improvement of both the model and the associated training datasets.
Training data and methodology
The model is the result of fine-tuning on a heterogeneous corpus of Italian handwritten sources from the late sixteenth to the eighteenth century, including poetic, satirical, narrative and documentary texts, on top of a base model trained on a wide range of Latin-script handwritten documents. In order to increase the diversity of writing styles, the training corpus also incorporates a selection of early modern printed calligraphy manuals, used to introduce additional stylistic variation in letterforms and writing practices.
Its performance was further optimized through the injection of synthetic training data generated from the manuscript material under study, in order to improve robustness to scribal variation and layout heterogeneity.
The complete data augmentation workflow is documented in the project Gitlab repository: https://scm.univ-tours.fr/cesr/prima/data_augmentation
Source collections
A representative portion of the training corpus is derived from digitized manuscripts preserved in the following institutions (non-exhaustive list):
- Biblioteca Nazionale Centrale di Firenze
- Biblioteca Marucelliana, Firenze
- Biblioteca dell’Archiginnasio, Bologna
- Fondo Joppi, Udine
- Biblioteca Bertoliana, Vicenza
- Biblioteca dell’Archiginnasio, Bologna
- Fondo Joppi, Udine
- Biblioteca Bertoliana, Vicenza
- Biblioteca Angelica, Roma
- Biblioteca Nazionale Centrale di Roma
- Biblioteca Nazionale Centrale di Roma
- Bibliothèque universitaire Droit-Lettres, Université Grenoble Alpes
The authors gratefully acknowledge these institutions for making their collections available for scholarly research.
Transcription and normalization
Normalization and transcription practices strictly follow the PRIMA transcription guidelines deposited in the Gitlab: https://scm.univ-tours.fr/cesr/prima/htr
Funding
The PRIMA project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme, Grant agreement No. 101142242.
Funded by the European Union. Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.
The authors gratefully acknowledge these institutions for making their collections available for scholarly research.
Transcription and normalization
Normalization and transcription practices strictly follow the PRIMA transcription guidelines deposited in the Gitlab: https://scm.univ-tours.fr/cesr/prima/htr
Funding
The PRIMA project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme, Grant agreement No. 101142242.
Funded by the European Union. Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.
Files
Files
(16.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:628f3cce7dbe9542f8d87408e3f39277
|
16.2 MB | Download |
Additional details
Dates
- Created
-
2026-01-12
Software
- Repository URL
- https://scm.univ-tours.fr/cesr/prima/htr