There is a newer version of the record available.

Published January 18, 2023 | Version 1
Dataset Open

HTR model for Latin and French Medieval Documentary Manuscripts (12th-15th)

  • 1. University of Luxembourg
  • 2. École nationale des chartes

Description

This is the best HTR model for documentary Latin and French manuscripts presented in the paper: Sergio Torres Aguilar, Vincent Jolivet. Handwritten Text Recognition for Documentary Medieval
Manuscripts.
2022. https://hal.science/hal-03892163

The model was trained on a charters and registers dataset from the Late-medieval period (12th-15th). The training and evaluation, entailing 1855 pages, 120k lines of text and almost 1M tokens, were conducted using three freely available ground-truth corpora :

The Alcar-HOME database : https://zenodo.org/record/5600884

The e-NDP corpus : https://zenodo.org/record/7575693

The Himanis project : https://zenodo.org/record/5535306

This final model operates in a multilingual environment (Latin and Old French) and it is able to recognize several Latin script families (mostly Textualis and Cursiva) in documents produced in ca. 12th - 15th centuries. During the evaluation the models shows an accuracy of 94.1% on the validation set and a CER (character error ratio) of about 0.12 to 0.17 on four external unseen datasets. A fine-tuning exercise using 10 ground-truth pages can raise these results to a CER between 0.06 to 0.10 respectively.

Files

metadata.json

Files (23.7 MB)

Name Size Download all
md5:27b9163851c3f5bb08160434697d6b9e
23.7 MB Download
md5:46d3c7d3d8921caebe2dd910462e62e1
1.3 kB Preview Download

Additional details

Related works