TRACES Model for Detecting Untrue Bulgarian Texts with F1-Score 0.96

Irina Temnikova

doi:10.5281/zenodo.7713572

Published March 9, 2023 | Version 1.0

Software Restricted

TRACES Model for Detecting Untrue Bulgarian Texts with F1-Score 0.96

Irina Temnikova

The purpose of this model is to provide an indication of whether a given text in Bulgarian potentially contains untrue information.

It outputs a probability label for the class "0" (contains TRUE information) and for the class "1" (contains UNTRUE information).

It can be combined with models, recognizing automatically generated texts, in order to identify textual deepfakes.

The model has been trained on Bulgarian social media messages, each manually labeled by three Bulgarian journalists of containing untrue information or not. It uses a tfidf vectorizer, also supplied.

The model achieves these values:

Accuracy: 0.933920704845815
Precision: 0.9324324324324325
Recall: 1.0
F1-score: 0.965034965034965

It can be loaded in this way (using Python):

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
import pickle

with open('SVM_model_untrue_inform_detection_tfidf_bg_0.96_F1_score.pkl', 'rb') as f1:
svm_model = pickle.load(f1)


with open('tfidf_vectorizer_untrue_inform_detection_tfidf_bg_0.96_F1_score.pkl', 'rb') as f2:
vectorizer = pickle.load(f2)

new_text="Нямам представа къде са изчезнали бисквитките."

X_test = vectorizer.transform(new_text)

y_pred = svm_model.predict(X_test)
probability_estimate = svm_model.predict_proba(X_test)

if y_pred == 1:
print ("The text contains untrue information")
print ("Probability:",probability_estimate[0][1])
elif y_pred == 0:
print ("The text does not contain untrue information")
print ("Probability:",probability_estimate[0][0])

print ("Probability:",probability_estimate)

Notes

This model was created within the TRACES project (https://traces.gate-ai.eu/), which has indirectly received funding from the European Union's Horizon 2020 research and innovation action programme, via the AI4Media Open Call #1 issued and executed under the AI4Media project (Grant Agreement no. 951911).

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

When you request access, you agree with the following Conditions of Use:

The model is distributed under BigScience Open RAIL-M License (read the license here: https://static1.squarespace.com/static/5c2a6d5c45776e85d1482a7e/t/6308bb4bba3a2a045b72a4b0/1661516619868/BigScience+Open+RAIL-M+License.pdf)
You are required to cite this model, by using this link: https://zenodo.org/record/7713572 and mentioning that it was developed during project TRACES (https://traces.gate-ai.eu/).
You agree and promise to abide the following conditions:

This is a Machine Learning (Artificial Intelligence) model. Because of this reason:
The analysis run by the model on the analyzed texts shows only the potential presence of untrue information.
This indication should be taken with a lower degree of confidence (certain likelihood, but not certainty).
No legal action should be taken against the authors of texts, whose texts are identified by the tool as potentially containing untrue information, solely based on the results of this tool.
The methods used and the results obtained are not suitable to be used for governmental or public authority purposes, including for investigations, intelligence work, criminal investigation, court or administrative proceedings.
The predictions for potential fakeness/wrongness of the texts, which the model provides are not statements/beliefs/affirmations of the Project's authors, researchers or participants.
The TRACES Project Sponsors, Researchers, the model author/owner, users or subjects shall not be liable or otherwise responsible for any damages (including pecuniary or moral damages) arising out or in relation to the model analysis, the method used and/or the results/outcomes.

You are currently not logged in. Do you have an account? Log in here

Additional details

European Commission
AI4Media - A European Excellence Centre for Media, Society and Democracy 951911

	All versions	This version
Views	236	85
Downloads	1	0
Data volume	2.7 kB	0 Bytes

TRACES Model for Detecting Untrue Bulgarian Texts with F1-Score 0.96

Creators

Description

Notes

Files

Restricted

Request access

Additional details

Funding