Published April 28, 2023 | Version 1
Dataset Open

Dataset for Paper: A System for Processing and Recognition of Greek Byzantine and Post-Byzantine Documents

  • 1. Institute of Informatics and Telecommunications, National Centre for Scientific Research "Demokritos"
  • 2. Institute for Language and Speech Processing - Athena Research and Innovation Center

Description

Dataset for the paper: "A System for Processing and Recognition of Greek Byzantine and Post-Byzantine Documents", P. Kaddas, K. Palaiologos, B. Gatos, V. Katsouros, K. Christopoulou, 17th International Conference on Document Analysis and Recognition (ICDAR), San Jose, California, USA

The dataset consists of 57 pages from the third edition of the Greek New Testament published by Robert Estienne (1503–1559), who was appointed “Royal Typographer” by the King of France François I (1494–1547). Robert Estienne produced this edition in 1550 using the grecs du roi typeface, produced by Claude Garamont on the basis of the Greek minuscule style of the calligrapher Angelos Vergikios (1505–1569) from Crete, who active copying Greek manuscripts in Venice and France. The dataset consists of 2045 cropped text line images in .png format with their corresponding OCR in .txt format, where 1431 used for training, 204 for validation and 410 for test. Initial images acquired from: https://bibles-online.net/flippingbook/1550/

Notes

This research has been partially co-financed by the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call "RESEARCH-CREATE-INNOVATE", project Culdile (Cultural Dimensions of Deep Learning, project code: Τ1ΕΔΚ-03785) and the Operational Program Attica 2014-2020, under the call "RESEARCH AND INNOVATION PARTNERSHIPS IN THE REGION OF ATTICA", project reBook (Digital platform for re-publishing Historical Greek Books, project code: ΑΤΤΡ4-0331172).

Files

icdar2023_dataset.zip

Files (385.9 MB)

Name Size Download all
md5:8b36d55c1024b52ec3ffd64bc77fc4f5
385.9 MB Preview Download

Additional details

Related works