TRACES Hierarchical Classification of Categories of Linguistic and Psycholinguistic Markers of Deception with Bulgarian Expression Lists for Disinformation Detection

Irina Temnikova; Ruslana Margova; Ivo Dzhumerov; Hristiana Nikolaeva

doi:10.5281/zenodo.7656905

Published February 20, 2023 | Version 1.0

Other Restricted

TRACES Hierarchical Classification of Categories of Linguistic and Psycholinguistic Markers of Deception with Bulgarian Expression Lists for Disinformation Detection

1. GATE Institute

These resources have been created within Project TRACES (more information: https://traces.gate-ai.eu/). The resources contain a hierarchical classification with 97 fine-grained and 18 coarse-grained categories of linguistic and psycholinguistic markers, signaling deception. The markers have been collected from related work (see the References section below) mostly on English language. Next to most categories, there are proposals for methods for detecting them, taking into account the specifics of Bulgarian language. As such, the classification can be adapted to other languages. The resource also contains lists of Bulgarian expressions, which have to be used for a look-up in the texts, in order to detect some of the categories of markers. One of the lists contains attention-attracting expressions, which have been collected from Bulgarian social media messages on Covid-19, but some of which are universal.

These resources can be used to identify disinformation, if considered as “false or misleading content that is spread with an intention to deceive or secure economic or political gain and which may cause public harm”, according to its definition by the Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee, and the Committee of the Regions on the European Democracy Action Plan.

For more information check our paper:

Irina Temnikova, Silvia Gargova, Ruslana Margova, Veneta Kireva, Ivo Dzhumerov, Tsvetelina Stefanova and Hristiana Nikolaeva (2023) New Bulgarian Resources for Detecting Disinformation. 10th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (LTC'23). Poznań. Poland.

Notes

The project TRACES has indirectly received funding from the European Union's Horizon 2020 research and innovation action programme, via the AI4Media Open Call #1 issued and executed under the AI4Media project (Grant Agreement no. 951911).

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

These resources are provided under Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) with the additional terms below. After reading the terms, please state that you read and accept the conditions and describe the entity you are applying from (if it exists - e.g. academic institution, company, government agency), and your intended use of the resources.

The TRACES team members will review your application and you may be granted access or not.

If you have questions, please contact us at: irina.temnikova@gmail.com

Conditions for using the dataset TRACES_Dts1_markersClassification-listsExpressions-LinguisticPsycholinguisticBulgarianCategoriesAndLists_1.0:

In order to be allowed access to these resources, in line with applicable legislation, including but not limited to the General Data Protection Regulation (GDPR), the Artificial Intelligence Act (AI Act, current draft as of 01 November 2022, pending adoption and entry into force), and the TRACES Project Data Management Plan, if you want to download or use the classification and the lists of expressions, you must agree with and abide with the following terms and conditions:

The linguistic markers are currently being developed and are provided purely and solely for scientific purposes. They cannot be used as conclusive evidence, as arguments on the merits of the dataset, as evidence in judicial or administrative proceedings or in any other way not directly related to Project TRACES.
No legal action should or could be taken against the authors of texts in which any of the linguistic markers or combinations of them have been discovered.
The resources are not suitable to be used and shall not be used for governmental or public authority purposes, including for investigations, government surveillance, intelligence work, analysis, criminal investigation, court or administrative proceedings.
The Project Sponsors (AI4Media, F6S, and the European Commission), the members of the TRACES team, users, or subjects shall not be liable or otherwise responsible for any consequences and/or damages (including pecuniary or moral damages) arising out of or in relation to the Project, the data collected, the classification of markers and the expression lists, and the methods used for their analysis and/or the results/outcomes.
This notice, as well as all the activities of the TRACES Project and of its Project Sponsors, team members, users, or subjects, including any contractual and/or non-contractual liability, are governed exclusively by the European Union laws and by the laws of the Republic of Bulgaria.
You agree to provide attribution to the TRACES project in the following format:
- The TRACES project (https://traces.gate-ai.eu/)
- Dataset name: TRACES_Dts1_markersClassification-listsExpressions-LinguisticPsycholinguisticBulgarianCategoriesAndLists_1.0
- Data source: Linguistic and Psycholinguistic literature on deception.
- Research article to cite: Irina Temnikova, Silvia Gargova, Ruslana Margova, Veneta Kireva, Ivo Dzhumerov, Tsvetelina Stefanova and Hristiana Nikolaeva (2023) New Bulgarian Resources for Detecting Disinformation. 10th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (LTC'23). Poznań. Poland.
- Link to the original dataset: https://zenodo.org/deposit/7656905

You are currently not logged in. Do you have an account? Log in here

Additional details

European Commission
AI4Media – A European Excellence Centre for Media, Society and Democracy 951911

	All versions	This version
Views	223	223
Downloads	1	1
Data volume	1.9 MB	1.9 MB

TRACES Hierarchical Classification of Categories of Linguistic and Psycholinguistic Markers of Deception with Bulgarian Expression Lists for Disinformation Detection

Creators

Description

Notes

Files

Restricted

Request access

Additional details

Funding