CAD: the Contextual Abuse Dataset

Vidgen, Bertie; Nguyen, Dong; Margetts, Helen; Rossini, Patricia; Tromble, Rebekah

doi:10.5281/zenodo.4881008

Published May 31, 2021 | Version v1.0 and v1.1

Dataset Open

CAD: the Contextual Abuse Dataset

Introducing CAD: the Contextual Abuse Dataset Bertie Vidgen, Dong Nguyen, Helen Margetts, Patricia Rossini, Rebekah Tromble, NAACL 2021

Online abuse can inflict harm on users and communities, making online spaces unsafe and toxic. Progress in automatically detecting and classifying abusive content is often held back by the lack of high quality and detailed datasets. We introduce a new dataset of primarily English Reddit entries which addresses several limitations of prior work. It (1) contains six conceptually distinct primary categories as well as secondary categories, (2) has labels annotated in the context of the conversation thread, (3) contains rationales and (4) uses an expert-driven group-adjudication process for high quality annotations. This repository contains the annotated dataset, annotation guidelines and the trained models and their output.

Code: https://github.com/dongpng/cad_naacl2021

Paper: https://www.aclweb.org/anthology/2021.naacl-main.182/

Files

data.zip

Files (663.2 MB)

Name	Size	Download all
data.zip md5:63091d7a48bac5666e818fefed1f6575	9.9 MB	Preview Download
experiments.zip md5:ade101318f4e821870f39138e582d780	653.2 MB	Preview Download
README.txt md5:cbf10b21d62d2b18c5b06e1681b2ee9b	3.6 kB	Preview Download

Views

Downloads

Show more details

	All versions	This version
Views	2,997	2,975
Downloads	1,594	1,585
Data volume	194.0 GB	193.9 GB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

Conference

2021 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2021) , 6-11 June 2021

Languages

English

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: June 4, 2021
Modified: June 4, 2021

CAD: the Contextual Abuse Dataset

Creators

Description

Files

data.zip

Files (663.2 MB)