HTR model for Latin and French Medieval Documentary Manuscripts (12th-15th)

Torres Aguilar, Sergio; Jolivet, Vincent

doi:10.5281/zenodo.7547438

Published January 18, 2023 | Version 1

Dataset Open

HTR model for Latin and French Medieval Documentary Manuscripts (12th-15th)

1. University of Luxembourg
2. École nationale des chartes

This is the best HTR model for documentary Latin and French manuscripts presented in the paper: Sergio Torres Aguilar, Vincent Jolivet. Handwritten Text Recognition for Documentary Medieval
Manuscripts. 2022. https://hal.science/hal-03892163

The model was trained on a charters and registers dataset from the Late-medieval period (12th-15th). The training and evaluation, entailing 1855 pages, 120k lines of text and almost 1M tokens, were conducted using three freely available ground-truth corpora :

The Alcar-HOME database : https://zenodo.org/record/5600884

The e-NDP corpus : https://zenodo.org/record/7575693

The Himanis project : https://zenodo.org/record/5535306

This final model operates in a multilingual environment (Latin and Old French) and it is able to recognize several Latin script families (mostly Textualis and Cursiva) in documents produced in ca. 12th - 15th centuries. During the evaluation the models shows an accuracy of 94.1% on the validation set and a CER (character error ratio) of about 0.12 to 0.17 on four external unseen datasets. A fine-tuning exercise using 10 ground-truth pages can raise these results to a CER between 0.06 to 0.10 respectively.

Files

metadata.json

Files (23.7 MB)

Name	Size	Download all
HTR_medieval_documentary_best.mlmodel md5:27b9163851c3f5bb08160434697d6b9e	23.7 MB	Download
metadata.json md5:46d3c7d3d8921caebe2dd910462e62e1	1.3 kB	Preview Download

Additional details

Cites: Preprint: https://hal.science/hal-03892163 (URL)

	All versions	This version
Views	3,625	1,756
Downloads	1,361	622
Data volume	17.6 GB	7.8 GB

HTR model for Latin and French Medieval Documentary Manuscripts (12th-15th)

Authors/Creators

Description

Files

metadata.json

Files (23.7 MB)

Additional details

Related works