Published February 3, 2025 | Version v1
Dataset Open

Ukrainian Epigraphic Corpus: Academic and Web-Based Texts (20th–21st Century)

  • 1. ROR icon Formation Continue UNIL-EPFL
  • 2. ROR icon Dragomanov Ukrainian State University

Description

Title:
Ukrainian Epigraphic Corpus: Academic and Web-Based Texts (20th–21st Century)

Description:
This dataset comprises a corpus of Ukrainian epigraphic texts collected from academic publications, conference proceedings, and web-based sources. The corpus is designed to support linguistic analysis, term extraction, and the development of a Simple Knowledge Organization System (SKOS) vocabulary for Ukrainian epigraphy.

The corpus includes 292 documents with over 1.29 million tokens and 778,104 words, reflecting a comprehensive linguistic and historical representation of Ukrainian inscriptions. Texts span from the second half of the 20th century to 2024 and cover diverse regions within Ukraine, such as Kyiv, Halychyna, and Chernihiv. The sources range from books and monographs to web-based epigraphic discussions, ensuring both academic rigor and contemporary relevance.

Data processing was conducted using Sketch Engine, including tokenization, lemmatization, and part-of-speech tagging to facilitate accurate term identification and frequency analysis. This corpus is particularly valuable for researchers in epigraphy, linguistics, digital humanities, and terminology studies.

Corpus Structure:

  • Academic Sub-Corpus: 18 documents (books, articles, encyclopedic entries)
  • Web Sub-Corpus: 274 documents (web articles, blogs, project websites)

License:
This dataset is released under the CC BY 4.0 license, allowing for reuse and adaptation with proper attribution.

Citation:
Ukrainian Epigraphic Text Corpus (2024). Available at Zenodo: [Insert DOI]

Files

epigraphic_corpus.txt

Files (12.1 MB)

Name Size Download all
md5:48533c63f03d24c63841f472b4bcfc75
12.0 MB Preview Download
md5:8e2e905dc1c6a4a203aef2149ae85e5f
2.6 kB Preview Download

Additional details

Related works

Is supplement to
Conference proceeding: 10.1007/978-3-031-72440-4_2 (DOI)
Journal: 10.32782/philspu/2024.7.9 (DOI)
Journal: 10.31392/NPU-nc.series9.2024.27.08 (DOI)

Dates

Created
2024

References

  • H. Tamrazyan, E. Boros, and F. Kaplan, 'Developing a Standardised Vocabulary for Ukrainian Epigraphy and Expanding Digital Epigraphic Resources', presented at the International Conference on Theory and Practice of Digital Libraries, Springer Nature Switzerland Cham, 2024, pp. 13–22. https://doi.org/10.1007/978-3-031-72440-4_2