Text Restoration of Historical Documents

Shibingfeng, Zhang

doi:10.5281/zenodo.19632820

Published April 17, 2026 | Version v1

Poster Open

Text Restoration of Historical Documents

Shibingfeng, Zhang

This PhD project investigates the application of pre-trained language models (PLMs) to the automated restoration of Latin diplomatic texts, with a focus on medieval notary documents. The project addresses a significant challenge in historical document studies: the reconstruction of damaged or missing text in low-resource Latin corpora. To this end, the project systematically evaluates a range of PLMs that vary in architecture, training language, and scale, to identify the most effective approach for this specialised restoration task.

The project is structured around the following research questions:

Does adding Ancient Greek and English during pre-training improve performance in Latin text restoration, or is monolingual pre-training exclusively on Latin more effective?

How does the performance of smaller, domain-specific models fine-tuned on Latin compare to large-scale commercial large language models using few-shot prompting in the context of Latin text restoration?

The experimental design distinguishes between two key settings based on whether the length of the missing text is known or unknown, which leads to the evaluation of both encoder-based models and encoder-decoder or decoder-only models. Controlled comparisons between model pairs which share identical architectures but differing in training data allow for a rigorous assessment of the effect of multilingual pre-training on downstream Latin text restoration tasks.

Files

Text Restoration of Historical Documents.pdf

Files (263.6 kB)

Name	Size	Download all
Text Restoration of Historical Documents.pdf md5:6cddbf61be5c89257e41cbf3842bbddf	263.6 kB	Preview Download

Additional details

European Commission
FutureData4EU - Training Future Big Data Experts for Europe 101126733

Created: 2026-04-17

	All versions	This version
Views	47	47
Downloads	24	24
Data volume	7.6 MB	7.6 MB

Text Restoration of Historical Documents.pdf

Files (263.6 kB)

Funding

Dates

Text Restoration of Historical Documents

Authors/Creators

Description

Files

Text Restoration of Historical Documents.pdf

Files (263.6 kB)

Additional details

Funding

Dates