Published December 12, 2025 | Version v1
Dataset Open

HTR Winter School 2025 - Late Medieval Latin

Description

Description:

This dataset contains ground truth for late-medieval Latin scripts, based on selected Central European manuscripts dating from the 13th to 15th centuries. It was created by participants of the Late Medieval Latin group during the Winter School: HTR of Historical Sources, held in Vienna in 2025.
 
⚠️ With the exception of folios 68v–75v of the manuscript Vienna, Austrian National Library (Österreichische Nationalbibliothek), Cod. 508 - 13th c., all transcriptions have been reviewed at least once. However, please be aware that they may still contain errors.
 
All transcriptions are available also on the GitHub repository: https://github.com/HTR-School-Vienna/2025--late-medieval-latin
 
🌐 You can browse transcriptions directly on GitHub Pages: https://htr-school-vienna.github.io/2025--late-medieval-latin/
 

Dataset:

Přibík of Pulkava’s Chronicle

02 - Praha, Národní knihovna ČR, sign. I C 24 - second half of the 15th c.

 
✍️ Transcription: Leila Leoni: 38v-45r

06 - Praha, Knihovna Národního muzea, sign. VIII F 49 („Stehlíkovský“) - first half of the 15th c.

🔗 Digital images of the manuscript:  http://www.manuscriptorium.com/apps/index.php?direct=record&pid=AIPDIG-NMP___VIII_F_49___2Z378T3-cs 

✍️ Transcription:

  • Hagar Barak: 7v-15r
  • Olga Kalashnikova: 15v-22v
  • Jennifer Kostoff-Kaard: 23v-30v
  • Angelica Coppola: 38v-44r

08 - Praha, Národní knihovna ČR, sign. XIX B 5 - 1380-1500

🔗 Digital images of the manuscript: http://www.manuscriptorium.com/apps/index.php?direct=record&pid=AIPDIG-NKCR__XIX_B_5_____1OYMOQD-cs 
✍️ Transcription: Urszula Zachara-Związek: XXIIIIr-XXXIr

17 - Munich, Bavarian State Library (Bayerische Staatsbibliothek), sign. Clm. 476 - circa 1476

🔗 Digital images of the manuscript: https://www.digitale-sammlungen.de/en/view/bsb00076444?page=,1 

✍️ Transcription:

  • Aliaksandra Valodzina: 18v-24v
  • Fabian Andre: 31r-38r

19 - Prague, National Library of the Czech Republic (Národní knihovna ČR), sign. I D 10 - second half of the 15th c.

🔗 Digital images of the manuscript:
http://www.manuscriptorium.com/apps/index.php?direct=record&pid=AIPDIG-NKCR__I_D_10______0NXXRKA-cs 

✍️ Transcription: Agnieszka Ziemińska: 127v-131r

Aeneas Silvius Piccolomini: Historia bohemica

Vienna, Austrian National Library (Österreichische Nationalbibliothek), Cod. 3445 - circa 1485-1496

🔗 Digital images of the manuscript: https://viewer.onb.ac.at/10264119/9 

✍️ Transcription:

  • Federico Rossi: 35r-42r
  • Petr Šámal: 42v-49v
  • Denise Ugliano: 50r-57r
  • Noam Lefler (not reviewed): 68v-75v

Cosmas of Prague

Vienna, Austrian National Library (Österreichische Nationalbibliothek), Cod. 508 - 13th c.

🔗 Digital images of the manuscript: https://viewer.onb.ac.at/13227113 

✍️ Transcription:

  • Thomas Paul Warren: 1r-6v
  • Amelie Paulsen: 17r-24r
  • Neal Bold: 7r-9r, 26r-30v
  • Michał Tadeusz Noworyta: 9v-16v

Dalimil`s Chronicle

Latin Fragment - 14th c., Prague, National Library of the Czech Republic (Národní knihovna ČR), XII E 17

🔗 Digital images of the manuscript: https://www.manuscriptorium.com/hub/catalog/default/detail/single/manuscriptorium%7CRASTIS-NKCR__XII_E_17____481A7Q1-cs?lang=cs 

✍️ Transcription: Riccardo Cassi: 1r, 2r, 5r-v, 7r-9v, 10v-12r

Transcription guidelines:

For the transcription guidelines, see the corresponding file. 

Data organisation:

The data are organised in eight folders, one for each manuscript. Each folder contains two subfolders: 'Ground Truth' and 'Done'. The 'Ground Truth’ folder contains transcribed pages that the organisers have checked, while the 'Done' folder contains finished pages that have not been checked. The transcriptions are in the form of XML files in the format of PageXML, with one file per page. Each manuscript folder also contains an image folder, which contains the images of the manuscript pages. The images are in the form of JPEG files, with one file per page. The images are named according to the folio number of the manuscript page.

How to cite:

This dataset was created by: Jan Odstrčilík, Zuzana Čermáková Lukšová, Annamária Kovács, Fabian Andre, Hagar Barak, Neal Bold, Riccardo Cassi, Angelica Coppola, Olga Kalashnikova, Jennifer Kostoff-Kaard, Noam Lefler, Leila Leoni, Michał Tadeusz Noworyta, Amelie Paulsen, Federico Rossi, Petr Šámal, Denise Ugliano, Aliaksandra Valodzina, Thomas Paul Warren, Urszula Zachara-Związek, Agnieszka Ziemińska. The digitisation is not copyright free, but the transcription is. However, properly annotating a corpus takes time and is a task that should be recognised. If you use any item from this corpus as ground truth, cite the dataset using the following information:

 

Files

02_-_Praha,_NKP,_I_C_24_-_Winter_School.zip

Files (207.6 MB)

Name Size Download all
md5:f843ec49d3bd4e0f95d92331f877c88b
5.4 MB Preview Download
md5:8549d060a84edb20fb5d5436b022ada2
23.7 MB Preview Download
md5:93550330b24e2405eb59319c57d20d7f
78.5 MB Preview Download
md5:76a890bd06256c36ec961fcf994923af
2.8 MB Preview Download
md5:78b2c5bdaa7945b315e0e0fa9151f8e4
23.7 MB Preview Download
md5:3f52b82f14df4b4b5356b0dd89734e53
17.3 MB Preview Download
md5:5e9c4efe3d5f12c96ce708bee7c24033
2.9 kB Download
md5:0231f4dc322ddb88a340e2415c06a50a
36.9 MB Preview Download
md5:83910fff97068e3283c3aa0c4b5bb745
4.0 kB Preview Download
md5:648395e0903dcf565842c2d39ec95e17
1.0 kB Download
md5:8cedbb6ef9bf09bf874c18e38589aa74
5.4 kB Preview Download
md5:adebe88caf56e2406aba8a863c81faab
19.2 MB Preview Download