GPT-3 Curie generated synthetic datasets based on the datasets: Founta, Stormfront, HatEval 2019, Davidson, GermEval 2021, SemEval 2022 Task 4

Schmidhuber, Maximilian

doi:10.5281/zenodo.10022788

Published October 19, 2023 | Version v1

Dataset Open

GPT-3 Curie generated synthetic datasets based on the datasets: Founta, Stormfront, HatEval 2019, Davidson, GermEval 2021, SemEval 2022 Task 4

Schmidhuber, Maximilian (Researcher)¹

1. University of Regensburg

This dataset is a composition of six toxic or hateful synthetic datasets based on the datasets published by:

"Large scale crowdsourcing and characterization of twitter abusive behavior"

"Hate Speech Dataset from a White Supremacy Forum"

"Automated hate speech detection and the problem of offensive language"

"Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter"

"Overview of the GermEval 2021 shared task on the identification of toxic, engaging, and fact-claiming comments"

"Don't patronize me! An annotated dataset with patronizing and condescending language towards vulnerable communities"

All data is generated by a separate GPT-3 Curie model fine-tuned on one label of the dataset. The data is not filtered and likely needs to be processed before being useful.

Files

new_Davidson_hateful_synthetic_data.csv

Files (86.1 MB)

Name	Size	Download all
GE_non_toxic_synthetic_data.tsv md5:24012e90a17dcf895ccd873b3474075f	8.2 MB	Download
GE_toxic_synthetic_data.tsv md5:0a9dd05f6eb8fc98243d8d6f4d886aed	8.6 MB	Download
new_Davidson_hateful_synthetic_data.csv md5:661908b264eb185a0b376f509322eaba	5.0 MB	Preview Download
new_Davidson_non_hateful_synthetic_data.csv md5:4134947725625708582cefd0a3ae35fe	4.2 MB	Preview Download
new_Founta_hateful_synthetic_data.csv md5:294f6174ab10da4dc090a2ba900c57cd	5.9 MB	Preview Download
new_Founta_non_hateful_synthetic_data.csv md5:a49484e1b314689de61a700041cac723	5.1 MB	Preview Download
new_HatEval_hateful_synthetic_data.tsv md5:68a9ffdbda5e3a691755abb31393dc51	7.1 MB	Download
new_HatEval_non_hateful_synthetic_data.tsv md5:04152b49e41e27b4d9a2e2779ef43422	7.3 MB	Download
non_patronizing_synthetic_data.csv md5:7692bb1d69fa9f52a3d88147a6fbfae5	11.2 MB	Preview Download
patronizing_synthetic_data.csv md5:bbd44f5720ed8ad4bd09fc114e9b87e2	13.8 MB	Preview Download
Stormfront_hateful_synthetic_data.tsv md5:5e84a326bb9b2ea027b1c0451f80f431	5.8 MB	Download
Stormfront_non_hateful_synthetic_data.tsv md5:2b4af47eb39829777312331f62f8f16f	3.8 MB	Download

Additional details

Created: 2023-10-19

	All versions	This version
Views	305	305
Downloads	714	714
Data volume	5.3 GB	5.3 GB

new_Davidson_hateful_synthetic_data.csv

Files (86.1 MB)

Related works

Dates

GPT-3 Curie generated synthetic datasets based on the datasets: Founta, Stormfront, HatEval 2019, Davidson, GermEval 2021, SemEval 2022 Task 4

Authors/Creators

Description

Files

new_Davidson_hateful_synthetic_data.csv

Files (86.1 MB)

Additional details

Related works

Dates