Enhancing HTR of Historical Texts through Scholarly Editions: A Case Study from an Ancient Collation of the Hebrew Bible
Authors/Creators
Contributors
Contact person (2):
Description
Printed critical editions of literary texts are a largely neglected source of knowledge in computational humanities. However, under certain conditions, they hold significant potential for multifaceted exploration: First, through Optical Character Recognition (OCR) of the text and its apparatus, coupled withintelligent parsing of the variant readings, it becomes possible to reconstruct comprehensive manuscript collations, which can prove invaluable for a variety of investigations, including phylogenetic analyses, redaction history studies, linguistic inquiries, and more. Second, by aligning the printed edition with manuscript images, a substantial amount of Handwritten Text Recognition (HTR) ground truth can be generated. This serves as valuable material for paleography, layout analysis, as well as for assessing the quality of the collation criteria adopted by the editor. The present paper focuses on the challenges mastered in the processes of the OCR, the apparatus parsing, the text reconstruction, and the alignment with the manuscript images, taking as a case study the edition of the Hebrew Bible published by Kennicott in the late eighteenth century
Files
bambaci_et_stoekl_CRH_Paris_paper6310.pdf
Files
(7.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:890ddb799fab9d42439b9d093462338f
|
7.4 MB | Preview Download |
Additional details
Funding
Dates
- Created
-
2023-11-27
Software
- Repository URL
- https://github.com/LuigiBambaci/REK
- Programming language
- ANTLR , Python
- Development Status
- Active