Published July 17, 2020 | Version 1.0
Dataset Open

SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)

  • 1. Rochester Institute of Technology
  • 2. Qatar Computing Research Institute, HBKU
  • 3. IBM Research
  • 4. University of Copenhagen
  • 5. University of Cambridge
  • 6. 2Qatar Computing Research Institute, Qatar
  • 7. 6 IT University Copenhagen, Denmark
  • 8. University of Wolverhampton, UK
  • 9. University of Tubingen, Germany


The task involves three subtasks corresponding to the hierarchical taxonomy of the OLID schema (Zampieri et al., 2019) from OffensEval 2019. The task featured five languages and this upload is for the English language. In addition, English also featured Subtasks B and C. OffensEval 2020 was one of the most popular tasks at SemEval-2020 attracting a large number of participants across all subtasks and also across all languages. A total of 528 teams signed up to participate in the task, 145 teams submitted systems during the evaluation period, and 70 submitted system description papers.

This upload includes a test set used in the paper describing the dataset used in the shared task as well as the official test set used in the shared task.

The evaluation phase for English is available on Codalab:

The Website for the shared task is


Files (244.1 MB)

Name Size Download all
469.3 kB Preview Download
8.6 kB Preview Download
261.7 kB Preview Download
226.9 MB Preview Download
4.8 MB Preview Download
11.6 MB Preview Download

Additional details


  • Rosenthal, Sara, et al. "A large-scale semi-supervised dataset for offensive language identification." arXiv preprint arXiv:2004.14454 (2020).
  • Zampieri, Marcos, et al. "SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)." arXiv preprint arXiv:2006.07235 (2020).