Published February 12, 2026 | Version v1
Dataset Open

Concordance Datasets of Ukrainian Medical and Crisis Lexemes in the GRAC-18 Corpus

  • 1. Lviv Polytechnic National University
  • 2. ROR icon Comenius University Bratislava

Description

This dataset contains a curated collection of concordances extracted from the GRAC-18 Ukrainian Language Corpus for a set of medical and crisis-related lexemes: вірус, діагноз, імунітет, інфекція, інфікований, пульс, симптом, хвороба.

The resource was created to support research on the semantic transformation and metaphorization of medical terminology in contemporary Ukrainian media discourse.

Background and purpose

Recently, medical vocabulary has increasingly migrated from specialized medical communication into public, political, economic, and socio-cultural discourse. Words traditionally used in clinical contexts are now frequently employed metaphorically to describe social crises, political processes, information flows, and collective behavior.

This dataset enables corpus-based investigation of:

  • semantic shift and metaphorization of medical lexemes
  • medicalisation of media and public discourse
  • crisis language in Ukrainian communication
  • interdisciplinary research in linguistics, digital humanities, and computational social science

The dataset was compiled as part of research on the medicalization of contemporary Ukrainian media discourse and semantic transformation of crisis lexemes (2014–2024).

Data source. All concordances were extracted from:

GRAC-18 (General Regionally Annotated Corpus of Ukrainian - https://uacorpus.org/en) - a large representative corpus of modern Ukrainian language covering journalistic, academic, literary, and online texts.

Dataset contents

The dataset consists of eight CSV files. Each file contains concordance lines for one lexeme. Typical columns include

  • keyword (node word)
  • left context
  • right context
  • source text metadata (when available)
  • concordance identifier/index

The structure allows both qualitative linguistic analysis and quantitative computational processing.

Possible research uses. 

The dataset can be used for:

  • corpus linguistics and discourse analysis
  • metaphor and semantic change research
  • sentiment and framing analysis
  • training and evaluation of NLP models for Ukrainian
  • digital humanities and media studies
  • crisis communication research

Methodological note

The concordances were extracted using corpus query tools and subsequently cleaned and structured into CSV format for reproducibility and reuse. The dataset preserves authentic corpus contexts without manual reinterpretation.

Language—Ukrainian.

File format—CSV (UTF-8 encoding).

Reuse

The dataset is intended for academic and educational use, including replication studies, corpus analysis, and interdisciplinary research.

Files

concordance_grac18_вірус.csv

Files (118.2 MB)

Name Size Download all
md5:c497e3a87d81133e74f7c719ad31e032
22.4 MB Preview Download
md5:5fa38adf48ca7c516ebed3988a680ec8
10.7 MB Preview Download
md5:5087180217837f48022f5dee37c168d0
1.0 MB Preview Download
md5:668290912d58fe63e628b9d9b759c6d2
11.7 MB Preview Download
md5:043374e28ed0b070686c2770a35f0ada
39.5 MB Preview Download
md5:51c839e76e108b8439f3e5fcba169950
6.7 MB Preview Download
md5:d81f153d7ebf264926e960b755fcad7e
15.3 MB Preview Download
md5:966249f714ae39dbaf418e31219aef46
10.8 MB Preview Download

Additional details

Dates

Collected
2026-02-05