HIMANIS Guérin
Creators
- 1. Institut de Recherche et d'Histoire des Textes (CNRS)
- 2. Leopold Franzens Universität für Innsbruck
Description
The dataset HIMANIS Guérin provides a ground-truth for HTR training (Handwritten Text Recognition) for 1217 images or part of images and 30015 lines (933 images and 22093 lines in Guérin 1; 284 images and 7922 lines in Guérin 2). It was established as part of the HIMANIS research project in collaboration with the READ consortium (Recognition and Enrichment of Archival Documents).
The base text is the edition by Paul Guérin, Recueil des documents concernant le Poitou contenus dans les registres de la Chancellerie de France, published between 1881 and 1919. The edition was digitized and OCR processed by the Bibliothèque nationale de France, then encoded by the Ecole nationale des Chartes (http://corpus.enc.sorbonne.fr/actesroyauxdupoitou/), then corrected and enhanced in HIMANIS, esp. for abbreviations and links to digital images (https://github.com/oriflamms/himanis/blob/master/Editions/Guerin_tome1-tome12.xml).
The text was aligned line by line on Transkribus by the READ consortium for the acts whose coordinates were indicated in the HIMANIS project, mainly for volumes Paris, Archives nationales, JJ 35 to JJ 91, but supplemented by information for the vol. 12 of Guérin's edition.
This dataset comprises two Transkribus exports, enriched with links to images accessible via IIIF protocol in the @corresp attribute of <graphic/> elements.
The historical corpus is described in Stutzmann, Dominique, Jean-François Moufflet, and Sébastien Hamel. « La recherche en plein texte dans les sources manuscrites médiévales : enjeux et perspectives du projet HIMANIS pour l’édition électronique ». Médiévales : Langue, textes, histoire 73 (2017): 67‑96. https://doi.org/10.4000/medievales.8198.
The present dataset is the training data for the " HIMANIS Chancery M1+ " model, cf. https://readcoop.eu/model/french-and-latin-chancery-documents/
Files
Guerin(1).zip
Files
(4.9 GB)
Name | Size | Download all |
---|---|---|
md5:c3f9ed153f96e0ec7c1e1e52aaa9b413
|
4.0 GB | Preview Download |
md5:5e9d71408ab45055e616052fee0ee270
|
871.4 MB | Preview Download |
Additional details
Related works
- Is documented by
- 10.4000/medievales.8198 (DOI)
Funding
References
- Stutzmann, Dominique, Jean-François Moufflet, and Sébastien Hamel. « La recherche en plein texte dans les sources manuscrites médiévales : enjeux et perspectives du projet HIMANIS pour l'édition électronique ». Médiévales : Langue, textes, histoire 73 (2017): 67‑96. https://doi.org/10.4000/medievales.8198