Published July 10, 2022 | Version 1.0.0
Dataset Open

YALTAi: Segmonto Manuscript and Early Printed Book Dataset

Description

This dataset has been built to train a segmentation model. It contains ALTO and YOLOv5 formats

This dataset is derived from:

  • CREMMA Medieval ( Pinche, A. (2022). Cremma Medieval (Version Bicerin 1.1.0) [Data set]. https://github.com/HTR-United/cremma-medieval )
  • CREMMA Medieval Lat (Clérice, T. and Vlachou-Efstathiou, M. (2022). Cremma Medieval Latin [Data set]. https://github.com/HTR-United/cremma-medieval-lat )
  • Eutyches. (Vlachou-Efstathiou, M. Voss.Lat.O.41 - Eutyches "de uerbo" glossed [Data set]. https://github.com/malamatenia/Eutyches)
  • Gallicorpora HTR-Incunable-15e-Siecle ( Pinche, A., Gabay, S., Leroy, N., & Christensen, K. Données HTR incunable du 15e siècle [Computer software]. https://github.com/Gallicorpora/HTR-incunable-15e-siecle )
  • Gallicorpora HTR-MSS-15e-Siecle ( Pinche, A., Gabay, S., Leroy, N., & Christensen, K. Données HTR manuscrits du 15e siècle [Computer software]. https://github.com/Gallicorpora/HTR-MSS-15e-Siecle )
  • Gallicorpora HTR-imprime-gothique-16e-siecle ( Pinche, A., Gabay, S., Vlachou-Efstathiou, M., & Christensen, K. HTR-imprime-gothique-16e-siecle [Computer software]. https://github.com/Gallicorpora/HTR-imprime-gothique-16e-siecle )

+ a few hundred newly annotated data, specifically the test set which is completely novel and based on early prints and manuscripts.

 

Dataset Number of images
Train 854
Dev 154
Test 139

 

Files

yaltai-segmonto-dataset.zip

Files (2.8 GB)

Name Size Download all
md5:fc816684e39d3603706e89e3b816c8eb
2.8 GB Preview Download

Additional details

References