Published July 17, 2025 | Version v1
Poster Open

Detecting Divergent Language Use in Russian Media During the Russo-Ukrainian War: Steps Towards Interpretable Propaganda Detection and Analysis

  • 1. ROR icon Saarland University

Description

Introduction and related work

The Russo-Ukrainian War has intensified the need to understand media manipulation and its societal impacts. While NLP methods for propaganda detection often rely on transformer-based models that require annotated data and lack transparency (cf. Da San Martino et al. 2021, Park et al. 2022), we propose applying Kullback-Leibler Divergence (KLD; Kullback / Leibler 1951). This method has been widely used for analyzing language variation, and it will allow us to detect distinctive features of divergence, providing highly interpretable results.

The Russian government has been using various narratives to gain support from its population, e.g., by euphemistic choices such as replacing war with special military operation (cf. Park et al. 2022, Solopova et al. 2023, Ustyianovych / Barbosa 2024). Using the WarMM-2022 corpus (Alyukov et al. 2023), which includes state-controlled media targeted at regime supporters and social media used by people with undefined or opposing political views, Alyukov et al. (2024) showed the two text types to use different propaganda frames, such as normalization (downplaying the effects of the war on everyday life in Russia) on state media, which aims to demobilize the population, and disinformation (presenting news from Ukraine and the West as fake) on social media, a more mobilizational approach. 

In this work, we aim to explore the linguistic means of propaganda frames by analyzing how language in Russian state-controlled media and social media linguistically differs and might reflect propaganda strategies (e.g., euphemisms usage).

Methodology

The WarMM-2022 corpus is a collection of 1.7M posts on the Russo-Ukrainian War with two sub-corpora: state-controlled mass media (24.4M tokens of press; 1.7M of TV transcripts; from February to September 2022), and social media posts with limited governmental control (268.4M tokens; July-September 2022). 

To assess the linguistic differences between state and social media in the WarMM-2022 corpus, we employ KLD, which quantifies divergence between two probability distributions, helping identify features that distinguish these text types. We apply KLD to content words, comparing the probability distributions of linguistic features in state-controlled and social media texts. KLD measures the additional bits needed to encode one distribution using another, revealing the extent to which specific features contribute to linguistic divergence. Unlike simple frequency-based methods, KLD captures even low-frequency yet distinctive linguistic variations (Degaetano-Ortlieb / Teich 2022).

We argue that KLD offers interpretable, reproducible insights into propaganda-related language variation, complementing opaque neural models whose limitations hinder linguistic analysis and adaptability across domains.

Results

The analysis reveals distinct patterns in the use of language between state-controlled media and social media in the context of the Russo-Ukrainian War. Terms related to geographical entities, specifically those Russia occupied, and referendums [1] are more prominent in state media, reflecting a territorial control narrative consistent with a normalization frame. Social media shows a higher contribution of direct war-related terminology, indicative of a mobilizational approach. In contrast, euphemisms for war are more characteristic of state media, aligning with efforts to downplay the invasion. Additionally, words such as propaganda and truth exhibit high KLD values on social media, pointing to the disinformation frame. Finally, there is evidence suggesting the use of homoglyphs — such as replacing the Cyrillic "у" with the Latin "y" — as a potential strategy to circumvent content moderation and spread disinformation on social platforms.

Conclusion

The findings suggest a clear divergence in rhetorical strategies between the two media types. State media predominantly employs demobilizational tactics, characterized by normalization framing and references to geopolitical control, aimed at downplaying the war and reinforcing authority. In contrast, social media content tends to adopt a mobilizational stance, marked by the disinformation frame and the use of direct war-related language. In the future, the analysis will be extended by applying KLD to other linguistic features such as parts of speech and syntactic patterns, as well as examining changes over time. Furthermore, state-of-the-art methods, including surprisal and word embeddings, will be explored to enhance the detection and analysis of propagandistic language and semantic shifts. Overall, this work underscores the potential of open, transparent methodologies to democratize access to knowledge and foster resilience against disinformation, aligning with the values of Open Science and the digital humanities.

[1] These refer to the sham referendums on Russia's annexation of Ukraine's occupied territories conducted in late September 2022.

Files

Detecting divergent language use in Russian Media during the Russo-Ukrainain War.pdf

Files (776.1 kB)

Additional details

Funding

European Commission
CASCADE - Computational Analysis of Semantic Change Across Different Environments 101119511

Dates

Submitted
2024-12-08
Accepted
2025-03-04

References

  • Alyukov, Maxim / Kunilovskaya, Maria / Semenov, Andrei (2023): "Wartime Media Monitor (WarMM-2022): A Study of Information Manipulation on Russian Social Media during the Russia-Ukraine War", in: Degaetano-Ortlieb, Stefania / Kazantseva, Anna / Reiter, Nils / Szpakowicz, Stan (eds.): Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, Dubrovnik, Croatia, May 2023: 152–161. DOI: 10.18653/v1/2023.latechclfl-1.17 [08.10.2024].
  • Alyukov, Maxim / Kunilovskaya, Maria / Semenov, Andrei (2024): "Confuse and Normalise: Authoritarian Propaganda in a High-Choice Media Environment and Russia's Invasion of Ukraine", in: Goode, Paul (ed.): Russian Propaganda Today: Challenges, Effectiveness, and Resistance. University of Michigan Press, University of Manchester Press: in print.
  • Da San Martino, Giovanni / Cresci, Stefano / Barrón-Cedeño, Alberto / Yu, Seunghak / Di Pietro, Roberto / Nakov, Preslav (2021): "A Survey on Computational Propaganda Detection", in: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI'20), Yokohama, Japan: 4826–4832 [10.12.2024].
  • Degaetano-Ortlieb, Stefania / Teich, Elke (2022): "Toward an Optimal Code for Communication: The Case of Scientific English", in: Corpus Linguistics and Linguistic Theory 18, 1: 175–207. DOI: 10.1515/cllt-2018-0088 [01.10.2024].
  • Kullback, Solomon / Leibler, Richard A. (1951): "On Information and Sufficiency", in: The Annals of Mathematical Statistics 22, 1: 79–86. DOI: 10.1214/aoms/1177729694 [10.12.2024].
  • Park, Chan Young / Mendelsohn, Julia / Field, Anjalie / Tsvetkov, Yulia (2022): "Challenges and Opportunities in Information Manipulation Detection: An Examination of Wartime Russian Media", in: Goldberg, Yoav / Kozareva, Zornitsa / Zhang, Yue (eds.): Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, UAE: 5209–5235. DOI: 10.18653/v1/2022.findings-emnlp.382 https://aclanthology.org/2022.findings-emnlp.382/ [10.03.2025].
  • Solopova, Veronika / Benzmüller, Christoph / Landgraf, Tim (2023): "The Evolution of Pro-Kremlin Propaganda From a Machine Learning and Linguistics Perspective", in: Romanyshyn, Mariana (ed.): Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP), Dubrovnik, Croatia: 40–48. DOI: 10.18653/v1/2023.unlp-1.5 [16.10.2024].
  • Ustyianovych, Taras / Barbosa, Denilson (2024): "Instant Messaging Platforms News Multi-Task Classification for Stance, Sentiment, and Discrimination Detection", in: Romanyshyn, Mariana / Romanyshyn, Nataliia / Hlybovets, Andrii / Ignatenko, Oleksii (eds.): Proceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP) @ LREC-COLING 2024, Torino, Italia: 30–40 [16.10.2024]