Published June 1, 2026 | Version v1
Presentation Open

Iterative Layout Detection training for Digitising Swedish Medical Periodicals (1781–2011): Fine-Tuning Layout Detection Models in the SweMPer Pipeline

  • 1. Uppsala Universitet

Description

Historical printed periodicals pose particular challenges for digitization: varying typography, inconsistent page layouts, degraded printing quality, and diverse content types such as articles, tables, figures, and advertisements. Within the framework of the SweMPer project, our goal is to build a national-scale digital archive of Swedish medical periodicals spanning over two centuries. To guarantee high-quality digitisation and to make the material accessible and reusable for both researchers and the public, a reliable layout detection and segmentation infrastructure is essential. The work presented here is part of a larger, machine-learning based pipeline for digitalising historical printed materials.

We describe a human-in-the-loop workflow for fine-tuning layout detection models tailored to the idiosyncrasies of Swedish medical periodicals. The process began with conceptualizing the process through discussions with domain experts, and careful selection and manual annotation of representative pages drawn from across the multi-century corpus. These annotations captured key structural elements—texts, titles, images, advertisements, tables, and more—forming a domain-specific ground-truth dataset that reflects the diversity and complexity of historical medical publications. Using this dataset, we performed an initial fine-tuning of a Mask R-CNN–based model via the LayoutParser toolkit, establishing a baseline capable of segmenting the main layout components of the scanned pages.

To further strengthen model performance, we adopted an iterative refinement strategy in which low-confidence predictions and underrepresented classes were systematically identified. Targeted re-annotation and the addition of new examples enabled successive rounds of re-training that substantially improved robustness and evaluation metrics. Building on this foundation, we then applied a transfer-learning approach using a transformer-based CoDeTR model with a Vision Transformer backbone. After pre-training on the large PubLayNet dataset and subsequent fine-tuning on the SweMPer data, this model achieved markedly higher detection accuracy.

We report quantitative improvements over the baseline across metrics such as mean average precision and recall. More broadly, we show how this workflow—combining domain expert discussions, domain-specific annotation, human-in-the-loop sampling, and state-of-the-art model adaptation—can be integrated into sustainable digitization pipelines and adapted for other historical document collections. Our results demonstrate that strategic investment in annotation and iterative model tuning enables scalable, high-quality digitization of culturally and historically significant corpora, supporting future digital humanities research, long-term archival preservation, and public access.

Files

UV_swemper_DariahEU_rome_may_2026.pdf

Files (8.3 MB)

Name Size Download all
md5:55a6efd67ef008a7251a51debd46fdc5
8.3 MB Preview Download

Additional details

Funding

Stiftelsen Riksbankens Jubileumsfond
Communication of Medicine: Digitization of Swedish Medical Journals, 1781–2011 (SweMPer) IN22-0017

Software