Published April 15, 2024 | Version Capricciosa
Model Open

SegmOnto

  • 1. ROR icon University of Geneva
  • 2. ROR icon Histoire, Archéologie, Littératures des Mondes Chrétiens et Musulmans Médiévaux
  • 3. ROR icon Centre National de la Recherche Scientifique

Contributors

Project manager:

  • 1. ROR icon École Nationale des Chartes

Description

SegmOnto

Layout analysis model trained with YALTAi, relying on YOLO models, and Kraken. Data are annotated with the SegmOnto controlled vocabulary. Most ot the training data are French texts, mainly prints but not only, produced by the Gallic(orpor)a, the FoNDUE and the SETAF projects. 

If you need to quote the paper:

@inproceedings{solfrini_OCR_2024, author={solfrini, Sonia and Gabay, Simon and Pinche, Ariane and Beaulnes, Pierre-Olivier and Marques Oliveira, Aurélia and Gross, Geneviève and Solfaroli Camillocci, Daniela}, title={Océriser les imprimés du XVIe siècle en langue française : le cas d'un corpus romand en caractères gothiques}, address={Meknes, Morocco}, year={2024}, month={May}, booktitle={Humanistica 2024}, publisher={Association francophone des humanités numériques} }

 

Files

Files (200.1 MB)

Name Size Download all
md5:addef2cf2f746795850d058ec4d16081
5.1 MB Download
md5:cdb9f447aa9b11989a7f30e08538c830
52.1 MB Download
md5:92b9d17dc3ce75b08f5682540b41fc5d
6.3 MB Download
md5:bb13144d1e7e7d57b7579bb26ff94210
136.8 MB Download

Additional details

Related works

Is documented by
Journal article: https://hal.science/hal-04343404 (URL)
Conference paper: https://hal.science/HUMANISTICA-2024/hal-04555002v1 (URL)