GPT-3 Curie generated synthetic datasets based on the datasets: Founta, Stormfront, HatEval 2019, Davidson, GermEval 2021, SemEval 2022 Task 4
Description
This dataset is a composition of six toxic or hateful synthetic datasets based on the datasets published by:
"Large scale crowdsourcing and characterization of twitter abusive behavior"
"Hate Speech Dataset from a White Supremacy Forum"
"Automated hate speech detection and the problem of offensive language"
"Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter"
"Overview of the GermEval 2021 shared task on the identification of toxic, engaging, and fact-claiming comments"
"Don't patronize me! An annotated dataset with patronizing and condescending language towards vulnerable communities"
All data is generated by a separate GPT-3 Curie model fine-tuned on one label of the dataset. The data is not filtered and likely needs to be processed before being useful.
Files
new_Davidson_hateful_synthetic_data.csv
Files
(86.1 MB)
Name | Size | Download all |
---|---|---|
md5:24012e90a17dcf895ccd873b3474075f
|
8.2 MB | Download |
md5:0a9dd05f6eb8fc98243d8d6f4d886aed
|
8.6 MB | Download |
md5:661908b264eb185a0b376f509322eaba
|
5.0 MB | Preview Download |
md5:4134947725625708582cefd0a3ae35fe
|
4.2 MB | Preview Download |
md5:294f6174ab10da4dc090a2ba900c57cd
|
5.9 MB | Preview Download |
md5:a49484e1b314689de61a700041cac723
|
5.1 MB | Preview Download |
md5:68a9ffdbda5e3a691755abb31393dc51
|
7.1 MB | Download |
md5:04152b49e41e27b4d9a2e2779ef43422
|
7.3 MB | Download |
md5:7692bb1d69fa9f52a3d88147a6fbfae5
|
11.2 MB | Preview Download |
md5:bbd44f5720ed8ad4bd09fc114e9b87e2
|
13.8 MB | Preview Download |
md5:5e84a326bb9b2ea027b1c0451f80f431
|
5.8 MB | Download |
md5:2b4af47eb39829777312331f62f8f16f
|
3.8 MB | Download |
Additional details
Dates
- Created
-
2023-10-19