Concordance Datasets of Ukrainian Medical and Crisis Lexemes in the GRAC-18 Corpus
Authors/Creators
Description
This dataset contains a curated collection of concordances extracted from the GRAC-18 Ukrainian Language Corpus for a set of medical and crisis-related lexemes: вірус, діагноз, імунітет, інфекція, інфікований, пульс, симптом, хвороба.
The resource was created to support research on the semantic transformation and metaphorization of medical terminology in contemporary Ukrainian media discourse.
Background and purpose
Recently, medical vocabulary has increasingly migrated from specialized medical communication into public, political, economic, and socio-cultural discourse. Words traditionally used in clinical contexts are now frequently employed metaphorically to describe social crises, political processes, information flows, and collective behavior.
This dataset enables corpus-based investigation of:
- semantic shift and metaphorization of medical lexemes
- medicalisation of media and public discourse
- crisis language in Ukrainian communication
- interdisciplinary research in linguistics, digital humanities, and computational social science
The dataset was compiled as part of research on the medicalization of contemporary Ukrainian media discourse and semantic transformation of crisis lexemes (2014–2024).
Data source. All concordances were extracted from:
GRAC-18 (General Regionally Annotated Corpus of Ukrainian - https://uacorpus.org/en) - a large representative corpus of modern Ukrainian language covering journalistic, academic, literary, and online texts.
Dataset contents
The dataset consists of eight CSV files. Each file contains concordance lines for one lexeme. Typical columns include
- keyword (node word)
- left context
- right context
- source text metadata (when available)
- concordance identifier/index
The structure allows both qualitative linguistic analysis and quantitative computational processing.
Possible research uses.
The dataset can be used for:
- corpus linguistics and discourse analysis
- metaphor and semantic change research
- sentiment and framing analysis
- training and evaluation of NLP models for Ukrainian
- digital humanities and media studies
- crisis communication research
Methodological note
The concordances were extracted using corpus query tools and subsequently cleaned and structured into CSV format for reproducibility and reuse. The dataset preserves authentic corpus contexts without manual reinterpretation.
Language—Ukrainian.
File format—CSV (UTF-8 encoding).
Reuse
The dataset is intended for academic and educational use, including replication studies, corpus analysis, and interdisciplinary research.
Files
concordance_grac18_вірус.csv
Files
(118.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:c497e3a87d81133e74f7c719ad31e032
|
22.4 MB | Preview Download |
|
md5:5fa38adf48ca7c516ebed3988a680ec8
|
10.7 MB | Preview Download |
|
md5:5087180217837f48022f5dee37c168d0
|
1.0 MB | Preview Download |
|
md5:668290912d58fe63e628b9d9b759c6d2
|
11.7 MB | Preview Download |
|
md5:043374e28ed0b070686c2770a35f0ada
|
39.5 MB | Preview Download |
|
md5:51c839e76e108b8439f3e5fcba169950
|
6.7 MB | Preview Download |
|
md5:d81f153d7ebf264926e960b755fcad7e
|
15.3 MB | Preview Download |
|
md5:966249f714ae39dbaf418e31219aef46
|
10.8 MB | Preview Download |
Additional details
Dates
- Collected
-
2026-02-05