Published March 9, 2023 | Version 1.0
Software Restricted

TRACES Model for Detecting Automatically Generated Bulgarian Texts with the GPT-2 and ChatGPT models with F1-Score 0.88

Authors/Creators

Description

The purpose of this model is to provide an indication of whether a given text in Bulgarian potentially represents automatically generated texts with the models GPT-2 and ChatGPT.

It outputs a probability label for the class "0" (the text is written by a HUMAN) and for the class "1" (the text has been potentially generated by a GPT-2 or the CahtGPT model).

It can be combined with models, recognizing untrue information, misinformation, or disinformation, in order to identify textual deepfakes.

The model has been trained on Bulgarian social media messages, automatically generated by GPT-2 and ChatGPT, starting from examples of Bulgarian social media messages. It uses a tfidf vectorizer, also supplied.

The model achieves these values:

Accuracy: 0.8860630722278738
Precision: 0.87
Recall: 0.9024896265560166
F1-score: 0.8859470468431772 

Notes

This model was created within the TRACES project (https://traces.gate-ai.eu/), which has indirectly received funding from the European Union's Horizon 2020 research and innovation action programme, via the AI4Media Open Call #1 issued and executed under the AI4Media project (Grant Agreement no. 951911).

Files

Restricted

The record is publicly accessible, but files are restricted. <a href="https://zenodo.org/account/settings/login?next=https://zenodo.org/records/7713672">Log in</a> to check if you have access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

When you request access, you agree with the following Conditions of Use:

  1. The model is distributed under BigScience Open RAIL-M License (read the license here: https://static1.squarespace.com/static/5c2a6d5c45776e85d1482a7e/t/6308bb4bba3a2a045b72a4b0/1661516619868/BigScience+Open+RAIL-M+License.pdf)
  2. You are required to cite this model, by using this link: https://zenodo.org/record/7713672 and mentioning that it was developed during project TRACES (https://traces.gate-ai.eu/).
  3. You agree and promise to abide the following conditions:
  • This is a Machine Learning (Artificial Intelligence) model. Because of this reason:

  • The analysis run by the model on the analyzed texts shows only the potential presence of Bulgarian texts, automatically generated by the models GPT-2 and ChatGPT.

  • This indication should be taken with a lower degree of confidence (certain likelihood, but not certainty).

  • No legal action should be taken against the authors of texts, whose texts are identified by the tool as potentially representing Bulgarian texts, automatically generated by the models GPT-2 and ChatGPT, solely based on the results of this tool.

  • The methods used and the results obtained are not suitable to be used for governmental or public authority purposes, including for investigations, intelligence work, criminal investigation, court or administrative proceedings.

  • The predictions for potential automatic generation of the input texts, which the model provides are not statements/beliefs/affirmations of the Project's authors, researchers or participants.

  • The TRACES Project Sponsors, Researchers, the model author/owner, users or subjects shall not be liable or otherwise responsible for any damages (including pecuniary or moral damages) arising out or in relation to the model analysis, the method used and/or the results/outcomes.

You are currently not logged in. Do you have an account? Log in here

Additional details

Funding

European Commission
AI4Media - A European Excellence Centre for Media, Society and Democracy 951911