Corpus Litterarum

Ströbel, Phillip Benjamin

doi:10.5281/zenodo.16995048

Published August 29, 2025 | Version v2

Dataset Open

Corpus Litterarum

Ströbel, Phillip Benjamin (Data curator)¹

1. University of Zurich

Corpus Litterarum is a line-based annotated dataset of Latin manuscript characters sampled from the Codices Sangallenses CSG 11 and CSG 70, provided by e-codices. Each line image has been annotated at the character level (73 classes) using Roboflow, with a semi-automatic workflow that combines manual annotation and model-assisted labelling. The dataset contains 2,152 line images and 44,407 annotations, distributed across predefined train/validation/test splits. Characters include standard Latin letters, abbreviations, and scribal signs, with suspensions left unresolved. The dataset supports research in palaeography, handwritten text recognition, and character segmentation.

Files

README.md

Files (109.7 MB)

Name	Size	Download all
data.yaml md5:73ff3a388fbe1adc6b38a97f892caa39	711 Bytes	Download
README.md md5:215d4c156f530665d75c0beeeafe67d4	3.8 kB	Preview Download
README.roboflow.txt md5:129ecf4cc73086065635b6cca6fa57fd	901 Bytes	Preview Download
test-images.zip md5:1a998247f849403110e102b9c21f062f	10.8 MB	Preview Download
test-labels.zip md5:d4aad61a3dc095d539172f5b1d61b309	192.5 kB	Preview Download
train-images.zip md5:16ffb2b62644c52f190894d6c83f3020	78.3 MB	Preview Download
train-labels.zip md5:63325d3f2801c88771e75c5c9347e135	1.4 MB	Preview Download
valid-images.zip md5:46534797d4877322465a574d89f58981	18.7 MB	Preview Download
valid-labels.zip md5:79e9b789128743d0cc816014adf7618c	328.9 kB	Preview Download

Additional details

Subtitle: A Ground Truth for 8th Century Character Recogntition

Development Status: Active

	All versions	This version
Views	2,016	1,993
Downloads	334	293
Data volume	3.5 GB	3.2 GB

README.md

Files (109.7 MB)

Additional titles

Software

Corpus Litterarum

Authors/Creators

Description

Files

README.md

Files (109.7 MB)

Additional details

Additional titles

Software