Published September 29, 2021 | Version v1
Dataset Open


  • 1. Institut de Recherche et d'Histoire des Textes (CNRS)
  • 2. Leopold Franzens Universität für Innsbruck


The dataset HIMANIS Guérin provides a ground-truth for HTR training (Handwritten Text Recognition) for 1217 images or part of images and 30015 lines (933 images and 22093 lines in Guérin 1; 284 images and 7922 lines in Guérin 2). It was established as part of the HIMANIS research project in collaboration with the READ consortium (Recognition and Enrichment of Archival Documents).

The base text is the edition by Paul Guérin, Recueil des documents concernant le Poitou contenus dans les registres de la Chancellerie de France, published between 1881 and 1919. The edition was digitized and OCR processed by the Bibliothèque nationale de France, then encoded by the Ecole nationale des Chartes (, then corrected and enhanced in HIMANIS, esp. for abbreviations and links to digital images (

The text was aligned line by line on Transkribus by the READ consortium for the acts whose coordinates were indicated in the HIMANIS project, mainly for volumes Paris, Archives nationales, JJ 35 to JJ 91, but supplemented by information for the vol. 12 of Guérin's edition.

This dataset comprises two Transkribus exports, enriched with links to images accessible via IIIF protocol in the @corresp attribute of <graphic/> elements.

The historical corpus is described in Stutzmann, Dominique, Jean-François Moufflet, and Sébastien Hamel. « La recherche en plein texte dans les sources manuscrites médiévales : enjeux et perspectives du projet HIMANIS pour l’édition électronique ». Médiévales : Langue, textes, histoire 73 (2017): 67‑96.

The present dataset is the training data for the " HIMANIS Chancery M1+ " model, cf.



Files (4.9 GB)

Name Size Download all
4.0 GB Preview Download
871.4 MB Preview Download

Additional details

Related works

Is documented by
10.4000/medievales.8198 (DOI)


Agence Nationale de la Recherche
HIMANIS – Indexation de manuscrits historiques pour une recherche contrôlée par l'utilisateur ANR-15-EPAT-0003
European Commission
READ – Recognition and Enrichment of Archival Documents 674943


  • Stutzmann, Dominique, Jean-François Moufflet, and Sébastien Hamel. « La recherche en plein texte dans les sources manuscrites médiévales : enjeux et perspectives du projet HIMANIS pour l'édition électronique ». Médiévales : Langue, textes, histoire 73 (2017): 67‑96.