WAC Corpus - Wikipedia Abusive Conversations
Description
This repository contains conversations between Wikipedia editors, which are annotated in terms of various types of abuse, at the level of messages. This corpus is described in the following publication:
N. Cécillon, V. Labatut, R. Dufour, and G. Linarès, “WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection,” in 12th Language Resources and Evaluation Conference, 2020, pp. 1375–1383. ⟨hal-02497514⟩
The repository also contains the figures shown in this article.
Sources. Our corpus aligns two existing corpora:
- Messages and conversation structures of WikiConv (https://github.com/conversationai/wikidetox/tree/master/wikiconv)
- Manual annotations in toxicity of Wikipedia Comment Corpus (WCC -- https://doi.org/10.6084/m9.figshare.4054689)
Citation. If you use this dataset, please cite the above article.
@InProceedings{Cecillon2020,
author = {Cécillon, Noé and Labatut, Vincent and Dufour, Richard and Linarès, Georges},
title = {{WAC}: A Corpus of {W}ikipedia Conversations for Online Abuse Detection},
booktitle = {12\textsuperscript{th} Language Resources and Evaluation Conference},
year = {2020},
pages = {1375-1383},
address = {Marseille, FR},
url = {http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.172.pdf},
}
Files
length_conversation.pdf
Files
(2.1 GB)
Name | Size | Download all |
---|---|---|
md5:9fd05fb6ed44f690d6412712d4cf5561
|
222.7 kB | Download |
md5:8901e606254817dc82019a8c361e85a1
|
292.4 kB | Download |
md5:d45fdd504a8cf30ba85b4c0c72c4d6d5
|
15.8 kB | Preview Download |
md5:8c7da6b7eb6affe196530fa0bac1e8aa
|
14.5 kB | Preview Download |
md5:01f6d8cffd7c289423959cb1bf48bc72
|
46.3 kB | Download |
md5:0d2a46222b256e1280121f944962ec15
|
2.1 GB | Preview Download |
Additional details
Related works
- Is documented by
- Conference paper: http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.172.pdf (URL)
- Obsoletes
- Dataset: 10.6084/m9.figshare.11302385 (DOI)
- Dataset: 10.6084/m9.figshare.11299118. (DOI)