Published January 12, 2026 | Version 1.0
Model Open

PRIMA HTR

  • 1. ROR icon Université de Tours
  • 2. Università degli Studi di Firenze

Contributors

Project leader:

  • 1. ROR icon Université de Tours
  • 2. Università degli Studi di Padova
  • 3. Centre d'Études Supérieures de la Renaissance
  • 4. EDMO icon University of Florence

Description

PRIMA HTR Model — Italian Early Modern Manuscripts (late 16th–18th c.)
Description
The PRIMA HTR model was developed within the framework of the ERC project PRIMA — Manuscripts in the Age of Print, hosted at the Centre d’Études Supérieures de la Renaissance (CESR – UMR 7323), Université de Tours, with the support of the LIFAT computer science laboratory (Université de Tours), which provided the high-performance computing infrastructure used for model training.
The model is designed to support large-scale transcription and analysis of Italian manuscript heritage from the late sixteenth to the eighteenth century, with a particular focus on literary, satirical and poetic texts. 

We invite scholars and institutions working on early modern Italian manuscripts to use this model and, whenever possible, to publish the resulting transcriptions in open repositories, in order to contribute to the continuous improvement of both the model and the associated training datasets.

Training data and methodology
The model is the result of fine-tuning on a heterogeneous corpus of Italian handwritten sources from the late sixteenth to the eighteenth century, including poetic, satirical, narrative and documentary texts, on top of a base model trained on a wide range of Latin-script handwritten documents. In order to increase the diversity of writing styles, the training corpus also incorporates a selection of early modern printed calligraphy manuals, used to introduce additional stylistic variation in letterforms and writing practices.

Its performance was further optimized through the injection of synthetic training data generated from the manuscript material under study, in order to improve robustness to scribal variation and layout heterogeneity.
The complete data augmentation workflow is documented in the project Gitlab repository: https://scm.univ-tours.fr/cesr/prima/data_augmentation

Source collections
A representative portion of the training corpus is derived from digitized manuscripts preserved in the following institutions (non-exhaustive list):

- Biblioteca Nazionale Centrale di Firenze
- Biblioteca Marucelliana, Firenze
- Biblioteca dell’Archiginnasio, Bologna
- Fondo Joppi, Udine
- Biblioteca Bertoliana, Vicenza
- Biblioteca Angelica, Roma
- Biblioteca Nazionale Centrale di Roma
- Bibliothèque universitaire Droit-Lettres, Université Grenoble Alpes

The authors gratefully acknowledge these institutions for making their collections available for scholarly research.

Transcription and normalization
Normalization and transcription practices strictly follow the PRIMA transcription guidelines deposited in the Gitlab: https://scm.univ-tours.fr/cesr/prima/htr

Funding
The PRIMA project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme, Grant agreement No. 101142242.
Funded by the European Union. Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.

Files

Files (16.2 MB)

Name Size Download all
md5:628f3cce7dbe9542f8d87408e3f39277
16.2 MB Download

Additional details

Dates

Created
2026-01-12