Corpus Litterarum
Description
Corpus Litterarum is a line-based annotated dataset of Latin manuscript characters sampled from the Codices Sangallenses CSG 11 and CSG 70, provided by e-codices. Each line image has been annotated at the character level (73 classes) using Roboflow, with a semi-automatic workflow that combines manual annotation and model-assisted labelling. The dataset contains 2,152 line images and 44,407 annotations, distributed across predefined train/validation/test splits. Characters include standard Latin letters, abbreviations, and scribal signs, with suspensions left unresolved. The dataset supports research in palaeography, handwritten text recognition, and character segmentation.
Files
README.md
Files
(109.7 MB)
Name | Size | Download all |
---|---|---|
md5:73ff3a388fbe1adc6b38a97f892caa39
|
711 Bytes | Download |
md5:215d4c156f530665d75c0beeeafe67d4
|
3.8 kB | Preview Download |
md5:129ecf4cc73086065635b6cca6fa57fd
|
901 Bytes | Preview Download |
md5:1a998247f849403110e102b9c21f062f
|
10.8 MB | Preview Download |
md5:d4aad61a3dc095d539172f5b1d61b309
|
192.5 kB | Preview Download |
md5:16ffb2b62644c52f190894d6c83f3020
|
78.3 MB | Preview Download |
md5:63325d3f2801c88771e75c5c9347e135
|
1.4 MB | Preview Download |
md5:46534797d4877322465a574d89f58981
|
18.7 MB | Preview Download |
md5:79e9b789128743d0cc816014adf7618c
|
328.9 kB | Preview Download |
Additional details
Additional titles
- Subtitle
- A Ground Truth for 8th Century Character Recogntition
Software
- Development Status
- Active