Published October 20, 2020 | Version 1.0
Dataset Open

EPARCHOS - Historical Greek handwritten document dataset

  • 1. Democritus University of Thrace, Department of Electrical and Computer Engineering, 67100 Xanthi, Greece

Description

The dataset originates from a Greek handwritten codex that dates from around 1500-1530. This is the subset of the codex British Museum Addit. 6791, written by two hands, one by Antonius Eparchos and the other by Camillos Zanettus (ff. 104r-174v) and delivers texts by Hierocles (In Aureum carmen), Matthaeus Blastares (Collectio alphabetica) and, notably, texts by Michael Psellos (De omnifaria doctrina). The writing delivers the most important abbreviations, logograms and conjunctions, which are cited in virtually every Greek minuscule handwritten codex from the years of the manuscript transliteration and the prevalence of the minuscule script (9th century) to the post-Byzantine years. This dataset consists of 120 scanned handwritten text pages, containing 9285 lines of text, 18809 words (6787 unique words). For each page, a PageXML is provided containing the following groundtruth:

  1. Text region polygon coordinates
  2. Text line polygon coordinates with the corresponding transcription text
  3. Word polygon coordinated with the corresponding transcription text

Files

eparchos.zip

Files (114.7 MB)

Name Size Download all
md5:e172aa8b4017436e37cf25991b5bf8b8
114.7 MB Preview Download