SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)

Zampieri, Marcos; Nakov, Preslav; Rosenthal, Sara; Atanasova, Pepa; Karadzhov, Georgi; Mubarak, Hamdy; Derczynski, Leon; Pitenis, Zeses; Coltekin, Cagrı

doi:10.5281/zenodo.3950379

Published July 17, 2020 | Version 1.0

Dataset Open

SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)

1. Rochester Institute of Technology
2. Qatar Computing Research Institute, HBKU
3. IBM Research
4. University of Copenhagen
5. University of Cambridge
6. 2Qatar Computing Research Institute, Qatar
7. 6 IT University Copenhagen, Denmark
8. University of Wolverhampton, UK
9. University of Tubingen, Germany

The task involves three subtasks corresponding to the hierarchical taxonomy of the OLID schema (Zampieri et al., 2019) from OffensEval 2019. The task featured five languages and this upload is for the English language. In addition, English also featured Subtasks B and C. OffensEval 2020 was one of the most popular tasks at SemEval-2020 attracting a large number of participants across all subtasks and also across all languages. A total of 528 teams signed up to participate in the task, 145 teams submitted systems during the evaluation period, and 70 submitted system description papers.

This upload includes a test set used in the paper describing the dataset used in the shared task as well as the official test set used in the shared task.

The evaluation phase for English is available on Codalab: https://competitions.codalab.org/competitions/23285

The Website for the shared task is https://sites.google.com/site/offensevalsharedtask/home

Files

extended_test-20200717T190516Z-001.zip

Files (244.1 MB)

Name	Size	Download all
extended_test-20200717T190516Z-001.zip md5:72a47ea414eaa6116075d43810b6eb00	469.3 kB	Preview Download
README.md md5:c4026ffda9998603b9d9d94ce16e2692	8.6 kB	Preview Download
semeval_test-20200717T190531Z-001.zip md5:7476d6555ea05374002cfd0a46667a0b	261.7 kB	Preview Download
task_a_distant.tsv.zip md5:4ff61a2c75f36e91d7b6616af2684859	226.9 MB	Preview Download
task_b_distant.tsv.zip md5:2bf18f6eb890ac7fc787897a79a7bc48	4.8 MB	Preview Download
task_c_distant.tsv.zip md5:f34dc50cffed1313026b29b91173a78d	11.6 MB	Preview Download

Additional details

Rosenthal, Sara, et al. "A large-scale semi-supervised dataset for offensive language identification." arXiv preprint arXiv:2004.14454 (2020).
Zampieri, Marcos, et al. "SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)." arXiv preprint arXiv:2006.07235 (2020).

	All versions	This version
Views	4,726	4,711
Downloads	3,837	3,829
Data volume	363.4 GB	362.2 GB

SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)

Authors/Creators

Description

Files

extended_test-20200717T190516Z-001.zip

Files (244.1 MB)

Additional details

References