AuTexTification Dataset (Full data)

Areg Sarvazyan; José Ángel González; Marc Franco; Francisco Manuel Rangel; María Alberta Chulvi; Paolo Rosso

doi:10.5281/zenodo.7956207

Published May 22, 2023 | Version v1

Dataset Restricted

AuTexTification Dataset (Full data)

1. Symanto
2. Universitat Politècnica de València

Datasets of the AuTexTification shared task at IberLEF 2023. This task aims to boost research on the detection of text generated automatically by text generation models. Participants must develop models that exploit clues about linguistic form and meaning to distinguish automatically generated text from human text.

This dataset includes the training and test splits with labels for all the subtasks and languages. Additionally, each file includes the domain, the model and the prompt used to generate each sample. The model label mapping for subtask 2 is: {"A": "bloom-1b7", "B": "bloom-3b", "C": "bloom-7b1", "D": "babbage", "E": "curie", "F": "text-davinci-003"}

Files

Restricted

The record is publicly accessible, but files are restricted. <a href="https://zenodo.org/account/settings/login?next=https://zenodo.org/records/7956207">Log in</a> to check if you have access.

Request access

If you would like to request access to these files, please fill out the form below.

Request the AuTexTification dataset.

You are currently not logged in. Do you have an account? Log in here

	All versions	This version
Views	953	935
Downloads	58	57
Data volume	1.8 GB	1.7 GB

AuTexTification Dataset (Full data)

Authors/Creators

Description

Files

Restricted

Request access