Published December 9, 2025 | Version v1
Dataset Open

Dataset and Code for: Tuning of Language Models in Eastern European Languages on Twitter/X

  • 1. ROR icon University of Ostrava

Description

This dataset contains the data and experimental code used in the study:

Filip, T., Pavlíček, M., & Sosík, P. (2024).
"Tuning of language models in Eastern European languages on Twitter/X."
In Proceedings of the Workshop on Artificial Intelligence and Language Technologies (Vol. 4092). CEUR-WS.

The dataset includes:
– text data from multiple Eastern European V4 languages collected from Twitter/X,
– preprocessing and cleaning scripts,
– experimental code used for training and evaluation,
– evaluation outputs (metrics, tables, plots),
– all resources required to reproduce the results presented in the article.

This dataset serves as supplementary material to the publication referenced above.

Funding acknowledgment:
This work has been produced with the financial support of the European Union under the 
"Biography of Fake News with a Touch of AI: Dangerous Phenomenon through the Prism of Modern Human Sciences" 
project no. CZ.02.01.01/00/23_025/0008724 via the Operational Programme Jan Ámos Komenský (OP JAK).

Files

zrec-paper-a-study-on-eastern-european-v4-languages-main.zip

Files (2.2 MB)

Additional details

Related works

Funding

Ministry of Education Youth and Sports
Biography of Fake News with a Touch of AI: Dangerous Phenomenon through the Prism of Modern Human Sciences CZ.02.01.01/00/23_025/0008724
Ministry of Education Youth and Sports
REFRESH – Research Excellence For REgion Sustainability and High-tech Industries CZ.10.03.01/00/22_003/0000048
Silesian University in Opava
SGS/9/2024

Dates

Issued
2025-12-09

References

  • Filip, T., Pavlíček, M., & Sosík, P. (2025). Tuning of language models in Eastern European languages on Twitter/X. In Proceedings of the Workshop on Artificial Intelligence and Language Technologies (Vol. 4092). CEUR-WS. https://doi.org/10.5281/zenodo.17723755