Dataset for Paper: A System for Processing and Recognition of Greek Byzantine and Post-Byzantine Documents
Creators
- 1. Institute of Informatics and Telecommunications, National Centre for Scientific Research "Demokritos"
- 2. Institute for Language and Speech Processing - Athena Research and Innovation Center
Description
Dataset for the paper: "A System for Processing and Recognition of Greek Byzantine and Post-Byzantine Documents", P. Kaddas, K. Palaiologos, B. Gatos, V. Katsouros, K. Christopoulou, 17th International Conference on Document Analysis and Recognition (ICDAR), San Jose, California, USA
The dataset consists of 57 pages from the third edition of the Greek New Testament published by Robert Estienne (1503–1559), who was appointed “Royal Typographer” by the King of France François I (1494–1547). Robert Estienne produced this edition in 1550 using the grecs du roi typeface, produced by Claude Garamont on the basis of the Greek minuscule style of the calligrapher Angelos Vergikios (1505–1569) from Crete, who active copying Greek manuscripts in Venice and France. The dataset consists of 2045 cropped text line images in .png format with their corresponding OCR in .txt format, where 1431 used for training, 204 for validation and 410 for test. Initial images acquired from: https://bibles-online.net/flippingbook/1550/
Notes
Files
icdar2023_dataset.zip
Files
(385.9 MB)
Name | Size | Download all |
---|---|---|
md5:8b36d55c1024b52ec3ffd64bc77fc4f5
|
385.9 MB | Preview Download |
Additional details
Related works
- Is derived from
- Other: https://bibles-online.net/flippingbook/1550/ (URL)