HTR Winter School 2024 - Syriac, ÖNB Cod. Syr. 1
Creators
-
Aboud Ishac, Ephrem
(Project leader)
-
Roughan, Christine
(Project leader)
- Awad, Ammar (Researcher)
-
Emilio, Carlo Biuzzi
(Researcher)
- Chandran, Saranya (Researcher)
-
Griggs, Jennifer
(Researcher)
-
Ivanova, Polina
(Researcher)
-
Malešević, Branko
(Researcher)
-
Marić, Stefan
(Researcher)
- Nateri, Francesca (Researcher)
-
Petrov, Ivan
(Researcher)
- Tava, Cristina (Researcher)
-
Thomas, Maria S.
(Researcher)
Description
Ground truth of 140 folios of ÖNB Cod. Syr. 1. This ground truth was produced by participants of the Vienna 2024 HTR Winter School, who used Transkribus to manually correct a preliminary automatic transcription that had been generated using Kraken/eScriptorium.
Description
- Vienna, Austrian National Library, MS ÖNB Cod. Syr. 1
- Syriac, Serto, 16th century
- Codex dated 1545 AD, scribed by Moses of Mardin in Vienna
Origin of the data
Source of the images: Österreichische Nationalbibliothek. See the online digitization at https://digital.onb.ac.at/RepViewer/viewer.faces?doc=DTL_2933415.Segmentation and Transcription guidelines
The segmentation of the folios followed the SegmOnto vocabulary for annotation of regions:MainZone: the main column of text.MainZone-gold: any sections of the main column where the text is written in gold block characters, as in the start of the text here. (The-character is a substitution for SegmOnto's recommended:character for declaring subtypes, since Transkribus did not allow for use of the colon character in the region name.)MarginTextZone: any marginal words or phrases, including catchwords. Also used for interlinear glosses.NumberingZone: any page or folio numbers.
The transcription includes spaces, the Syriac letters, some diacritics, punctuation, and no vowel dots or markings.
- Allowed diacritics:
- Syome
- Dots over feminine suffix heh
- Dots in pronouns: above for demonstrative, below for personal
- Dots in verbs: to distinguish participles and perfects
- Dots to distinguish homographs
- Excluded diacritics:
- Vowel dots
- Dots of hardening and softening (qushoyo and rukokho)
Punctuation marks were not normalized, but rather transcribed as they appear in the manuscript (. ܆ ܇ : ܀).
Transkribus's unclear tag was used when readings were uncertain or the text was damaged or unclear.
Copyright and licence
This dataset was created as part of the Winter School of Handwritten Text Recognition of Medieval Manuscripts 2024, Vienna at the Österreichische Akademie der Wissenschaften, Institut für Mittelalterforschung, all transcriptions are licensed under the Creative Commons 4 licence. Images were provided by the Austrian National Library (ÖNB) and are licensed under Creative Commons 4 licence.