Dataset Open Access

CAD: the Contextual Abuse Dataset

Vidgen, Bertie; Nguyen, Dong; Margetts, Helen; Rossini, Patricia; Tromble, Rebekah

Introducing CAD: the Contextual Abuse Dataset Bertie Vidgen, Dong Nguyen, Helen Margetts, Patricia Rossini, Rebekah Tromble, NAACL 2021

Online abuse can inflict harm on users and communities, making online spaces unsafe and toxic. Progress in automatically detecting and classifying abusive content is often held back by the lack of high quality and detailed datasets. We introduce a new dataset of primarily English Reddit entries which addresses several limitations of prior work. It (1) contains six conceptually distinct primary categories as well as secondary categories, (2) has labels annotated in the context of the conversation thread, (3) contains rationales and (4) uses an expert-driven group-adjudication process for high quality annotations. This repository contains the annotated dataset, annotation guidelines and the trained models and their output.

Code: https://github.com/dongpng/cad_naacl2021

Paper: https://www.aclweb.org/anthology/2021.naacl-main.182/

 

 

 

 

 

Files (663.2 MB)
Name Size
data.zip
md5:63091d7a48bac5666e818fefed1f6575
9.9 MB Download
experiments.zip
md5:ade101318f4e821870f39138e582d780
653.2 MB Download
README.txt
md5:cbf10b21d62d2b18c5b06e1681b2ee9b
3.6 kB Download
449
276
views
downloads
All versions This version
Views 449449
Downloads 276276
Data volume 33.0 GB33.0 GB
Unique views 403403
Unique downloads 185185

Share

Cite as