Patrologia Graeca (OCRized and analyzed texts)
Creators
Description
The CGPG project (Calfa GRE*g*ORI Patrologia Graeca), led by Jean-Marie Auwers (UCLouvain), aims to OCRize the remaining non-digital versions of the Patrologia Graeca volumes. The project relies on the expertise of GREgORI and Calfa.
The project is sponsored by the ASBL *Byzantion*, the Fondation *Sedes Sapientiae*, the Institut *Religions, Spiritualités, Cultures, Sociétés* (RSCS, UCLouvain) and the Centre d'études orientales (CIOL, UCLouvain) and by a generous donor who wishes to remain anonymous. Other sponsors have recently expressed their willingness to support the project.
This repository contains the sketch engine XML files, with linguistic markups.
Raw data are available on Github : https://github.com/calfa-co/Patrologia-Graeca
For an optimal use in Sketch Engine, configure the corpus (Manage Corpus/Configure/Expert settings) as below
DOCSTRUCTURE "doc"
ENCODING "UTF-8"
INFO ""
LANGUAGE "Ancient Greek"
NAME "CGPG_20250629"
PATH "/corpora/ca/user_data/sso_1392/manatee/cgpg_20250629"
VERTICAL "| ca_getvertical '/corpora/ca/user_data/sso_1392/registry/cgpg_20250629' 'docx'"
ATTRIBUTE "word" {
MAPTO "lemma"
}
ATTRIBUTE "intuitive_form" {
}
ATTRIBUTE "lemma" {
}
ATTRIBUTE "intuitive_lemma" {
}
ATTRIBUTE "pos" {
}
ATTRIBUTE "headword" {
}
STRUCTURE "w" {
DEFAULTLOCALE "C"
ENCODING "UTF-8"
LANGUAGE ""
NESTED ""
ATTRIBUTE "id" {
DYNLIB ""
DYNTYPE "index"
ENCODING "UTF-8"
LOCALE "C"
MULTISEP ","
MULTIVALUE "n"
TYPE "MD_MI"
}
}
STRUCTURE "doc" {
DEFAULTLOCALE "C"
ENCODING "UTF-8"
LANGUAGE ""
NESTED ""
ATTRIBUTE "id" {
DYNLIB ""
DYNTYPE "index"
ENCODING "UTF-8"
LOCALE "C"
MULTISEP ","
MULTIVALUE "n"
TYPE "MD_MI"
}
}
STRUCTURE "docx" {
DEFAULTLOCALE "C"
ENCODING "UTF-8"
LANGUAGE ""
NESTED ""
ATTRIBUTE "id" {
DYNLIB ""
DYNTYPE "index"
ENCODING "UTF-8"
LABEL "File ID"
LOCALE "C"
MULTISEP ","
MULTIVALUE "n"
TYPE "MD_MI"
UNIQUE "1"
}
ATTRIBUTE "filename" {
DYNLIB ""
DYNTYPE "index"
ENCODING "UTF-8"
LABEL "File name"
LOCALE "C"
MULTISEP ","
MULTIVALUE "n"
TYPE "MD_MI"
}
}
Bibliography
- KINDT B., AUWERS J.-M., La Fondation Sedes Sapientiae soutient le projet de valorisation numérique de la Patrologie grecque, dans Bulletin de la Fondation Sedes Sapientiae, 45 (janvier 2024), p. 19-21 (https://cdn.uclouvain.be/groups/cms-editors-teco/angelique/fondation-sedes-sapientiae/UCL-TECO-Sedes Sapientiae-Bulletin 2024-WEB.pdf).
- KINDT B., VIDAL-GORÈNE C., DELLE DONNE S., Analyse automatique du grec ancien par réseau de neurones. Évaluation sur le corpus De Thessalonica Capta, dans BABELAO, 10-11 (2022), p. 525-550 (https://ojs.uclouvain.be/index.php/babelao/article/view/65073).
- KINDT B., VIDAL-GORÈNE C., From manuscript to tagged corpora. An automated process for Ancient Armenian or other under resourced languages of the Christian East, in Armeniaca. International Journal of Armenian Studies, 1 (2022), p. 73-96 (https://edizionicafoscari.unive.it/en/edizioni4/riviste/armeniaca/2022/1/from-manuscript-to-tagged-corpora/).
- VIDAL-GORÈNE C., CAFIERO F., KINDT B., Under-resourced studies of under-resourced languages: lemmatization and POS-tagging with LLM annotators for historical Armenian, Georgian, Greek and Syriac, 2025, published online on the HAL Science ouverte portal (https://hal.science/hal-05119485).
- VIDAL-GORÈNE C., La reconnaissance automatique d'écriture à l'épreuve des langues peu dotées, Programming Historian en français, 5 (2023) (https://doi.org/10.46430/phfr0023).
- VIDAL-GORÈNE C., Reconhecimento automático de manuscritos para o teste de idiomas não latinos, O Programming Historian em portugês, 5 (2024) (https://doi.org/10.46430/phpt0046).
Files
PG.zip
Files
(47.8 MB)
Name | Size | Download all |
---|---|---|
md5:7b8d398f7699859e50a9db75bd61de25
|
47.8 MB | Preview Download |
Additional details
Related works
- Is part of
- Dataset: https://github.com/calfa-co/Patrologia-Graeca (URL)
Software
- Repository URL
- https://github.com/calfa-co/Patrologia-Graeca