Published January 21, 2025 | Version 1.0.0
Dataset Open

HTR Winter School 2024 - Syriac, ÖNB Cod. Syr. 1

Description

Ground truth of 140 folios of ÖNB Cod. Syr. 1. This ground truth was produced by participants of the Vienna 2024 HTR Winter School, who used Transkribus to manually correct a preliminary automatic transcription that had been generated using Kraken/eScriptorium.

Description

  • Vienna, Austrian National Library, MS ÖNB Cod. Syr. 1
  • Syriac, Serto, 16th century
  • Codex dated 1545 AD, scribed by Moses of Mardin in Vienna

Origin of the data

Source of the images: Österreichische Nationalbibliothek. See the online digitization at https://digital.onb.ac.at/RepViewer/viewer.faces?doc=DTL_2933415.
 

Segmentation and Transcription guidelines

The segmentation of the folios followed the SegmOnto vocabulary for annotation of regions:
  • MainZone: the main column of text.
  • MainZone-gold: any sections of the main column where the text is written in gold block characters, as in the start of the text here. (The - character is a substitution for SegmOnto's recommended : character for declaring subtypes, since Transkribus did not allow for use of the colon character in the region name.)
  • MarginTextZone: any marginal words or phrases, including catchwords. Also used for interlinear glosses.
  • NumberingZone: any page or folio numbers.

The transcription includes spaces, the Syriac letters, some diacritics, punctuation, and no vowel dots or markings.

  • Allowed diacritics:
    • Syome
    • Dots over feminine suffix heh
    • Dots in pronouns: above for demonstrative, below for personal
    • Dots in verbs: to distinguish participles and perfects
    • Dots to distinguish homographs
  • Excluded diacritics:
    • Vowel dots
    • Dots of hardening and softening (qushoyo and rukokho)

Punctuation marks were not normalized, but rather transcribed as they appear in the manuscript (. ܆ ܇ : ܀).

Transkribus's unclear tag was used when readings were uncertain or the text was damaged or unclear.

Copyright and licence

This dataset was created as part of the Winter School of Handwritten Text Recognition of Medieval Manuscripts 2024, Vienna at the Österreichische Akademie der Wissenschaften, Institut für Mittelalterforschung, all transcriptions are licensed under the Creative Commons 4 licence. Images were provided by the Austrian National Library (ÖNB) and are licensed under Creative Commons 4 licence.

Files

page.zip

Files (469.7 MB)

Name Size Download all
md5:565dc61d839856f68ebc3b9cc3b1acbc
1.7 kB Download
md5:6f3b8ad3d6caaf0f790a0993afcfe6bf
3.5 kB Download
md5:a890f61ac5caa1a62c141327be56b696
468.6 MB Preview Download
md5:1e06e5814a3c08daf4e9158289231dc1
1.1 MB Preview Download