Published November 27, 2023 | Version v1

Enhancing HTR of Historical Texts through Scholarly Editions: A Case Study from an Ancient Collation of the Hebrew Bible

  • 1. ROR icon École Pratique des Hautes Études
  • 2. ROR icon Université Paris Sciences et Lettres

Contributors

  • 1. ROR icon Tel Aviv University
  • 2. ROR icon Monash University

Description

Printed critical editions of literary texts are a largely neglected source of knowledge in computational humanities. However, under certain conditions, they hold significant potential for multifaceted exploration: First, through Optical Character Recognition (OCR) of the text and its apparatus, coupled withintelligent parsing of the variant readings, it becomes possible to reconstruct comprehensive manuscript collations, which can prove invaluable for a variety of investigations, including phylogenetic analyses, redaction history studies, linguistic inquiries, and more. Second, by aligning the printed edition with manuscript images, a substantial amount of Handwritten Text Recognition (HTR) ground truth can be generated. This serves as valuable material for paleography, layout analysis, as well as for assessing the quality of the collation criteria adopted by the editor. The present paper focuses on the challenges mastered in the processes of the OCR, the apparatus parsing, the text reconstruction, and the alignment with the manuscript images, taking as a case study the edition of the Hebrew Bible published by Kennicott in the late eighteenth century

Files

bambaci_et_stoekl_CRH_Paris_paper6310.pdf

Files (7.4 MB)

Name Size Download all
md5:890ddb799fab9d42439b9d093462338f
7.4 MB Preview Download

Additional details

Funding

European Research Council
ERC Synergy MiDRASH 101071829
Agence Nationale de la Recherche
Equipex Biblissima+ ANR-21-ESRE-0005

Dates

Created
2023-11-27

Software

Repository URL
https://github.com/LuigiBambaci/REK
Programming language
ANTLR , Python
Development Status
Active