TRACES Bulgarian Telegram Dataset Annotated with Linguistic Markers of Lies
Creators
Contributors
Data curators:
- 1. Gate Institute
- 2. GATE Institute
Description
This dataset has been created within Project TRACES (more information: https://traces.gate-ai.eu/). The dataset contains 8791 anonymized Telegram social media posts, written in Bulgarian. The dataset is annotated with general information (named entities, part-of-speech tags, sentence length, etc.) and specific markers signaling details and can be used for general purposes or for building lies, manipulation, and disinformation detection applications.
Note: this dataset is not fact-checked, the social media messages have been retrieved via keywords. For fact-checked datasets, see our other datasets.
The social media posts have been collected via Telegram Desktop in June-July 2022.
Explanations of which fields can be used as markers of lies (or of intentional disinformation) are provided in our forthcoming paper:
Irina Temnikova, Silvia Gargova, Ruslana Margova, Veneta Kireva, Ivo Dzhumerov, Tsvetelina Stefanova and Hristiana Nikolaeva (2023) New Bulgarian Resources for Detecting Disinformation. 10th Language and Technology
Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (LTC'23). Poznań. Poland.