Published August 24, 2021 | Version v2
Dataset Open

RP-Mod & RP-Crowd: Moderator- and Crowd-Annotated German News Comment Datasets

  • 1. University of Münster - ERCIS
  • 2. University of Koblenz-Landau

Description

Abuse and hate are penetrating social media and many comment sections of news media companies. These platform providers invest considerable efforts to moderate user-generated contributions to prevent losing readers who get appalled by inappropriate texts. This is further enforced by legislative actions, which make non-clearance of these comments a punishable action. While (semi-)automated solutions using Natural Language Processing and advanced Machine Learning techniques are getting increasingly sophisticated, the domain of abusive language detection still struggles as large non-English and well-curated datasets are scarce or not publicly available.

With this work, we publish and analyse the largest annotated German abusive language comment datasets to date. In contrast to existing datasets, we achieve a high labelling standard by conducting a thorough crowd-based annotation study that complements professional moderators' decisions, which are also included in the dataset. We compare and cross-evaluate the performance of baseline algorithms and state-of-the-art transformer-based language models, which are fine-tuned on our datasets and an existing alternative, showing the usefulness for the community.

Notes

The research leading to these results received funding from the federal state of North Rhine-Westphalia and the European Regional Development Fund (EFRE.NRW 2014-2020), Project: MODERAT! (No. CM-2-2-036a).

Files

RP-Crowd-1-folds.csv

Files (81.2 MB)

Name Size Download all
md5:fc0e87dc16071ce1c3f062c44694a4d6
27.7 kB Download
md5:604ab76a1ce43392be2828868380dc9c
10.9 MB Download
md5:00b5eb167a982414198387c193348fef
14.1 MB Preview Download
md5:541b7a9521cd8ba04e4ee1a02dd633b4
14.1 MB Preview Download
md5:3951b2ca327c8af0875cb0963d53b747
4.2 MB Preview Download
md5:5185160d6ee3fcb10d98e8a264a124c7
4.2 MB Preview Download
md5:86009f480c3e4a826e140f5cc0ae8407
1.5 MB Preview Download
md5:12250a4e4ed17bc80ad724b22df8117a
1.5 MB Preview Download
md5:33301b7c98db8a105925aaa4ad75f376
454.2 kB Preview Download
md5:cf3150a86a0ed745733cc17c10f2fc7f
456.2 kB Preview Download
md5:09fc3b493faaa3bc4aa2a0c5ddeeb6a2
93.1 kB Preview Download
md5:15e346bee37f9e38d9c06f79bd5e05ad
93.5 kB Preview Download
md5:eec963103f9baca53153278b72206fb3
23.2 MB Preview Download
md5:a2caffcfa3d7c1e1e2662e41c7f21bb7
3.2 MB Preview Download
md5:2b54e6c10ee1ed27ce4af0afe63208ba
3.1 MB Preview Download