Published November 29, 2019 | Version 1.0.0
Dataset Open

WAC Corpus - Wikipedia Abusive Conversations

  • 1. Avignon Université

Description

This repository contains conversations between Wikipedia editors, which are annotated in terms of various types of abuse, at the level of messages. This corpus is described in the following publication:

N. Cécillon, V. Labatut, R. Dufour, and G. Linarès, “WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection,” in 12th Language Resources and Evaluation Conference, 2020, pp. 1375–1383. ⟨hal-02497514

If you use this dataset, please cite the above article. The repository also contains the figures shown in this article.

Our corpus aligns two existing corpora:

Files

length_conversation.pdf

Files (2.1 GB)

Name Size Download all
md5:9fd05fb6ed44f690d6412712d4cf5561
222.7 kB Download
md5:8901e606254817dc82019a8c361e85a1
292.4 kB Download
md5:d45fdd504a8cf30ba85b4c0c72c4d6d5
15.8 kB Preview Download
md5:8c7da6b7eb6affe196530fa0bac1e8aa
14.5 kB Preview Download
md5:01f6d8cffd7c289423959cb1bf48bc72
46.3 kB Download
md5:0d2a46222b256e1280121f944962ec15
2.1 GB Preview Download

Additional details

Related works

Is documented by
Conference paper: hal-02497514 (hal)
Obsoletes
Dataset: 10.6084/m9.figshare.11302385 (DOI)
Dataset: 10.6084/m9.figshare.11299118. (DOI)