There is a newer version of the record available.

Published February 21, 2020 | Version 1
Dataset Restricted

Restricted Dataset for "Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior"

  • 1. Aristotle University of Thessaloniki
  • 2. Cyprus University of Technology
  • 3. Telefonica Research
  • 4. University of Alabama at Birmingham
  • 5. University College London


Restricted Dataset for the "Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior" paper, published in ICWSM 2018. The full text of the paper can be found here.  The Public version of the dataset can be found here

  • hatespeech_text_label_vote_RESTRICTED_100K.csv: contains ~100K raws with tweet text, the associated majority label, and the number of votes for the majority label. The tweets are shuffled so that there is no connection between tweet IDs and texts (in order to be in line with the T&C of Twitter). 

  • retweets.csv: contains ~2K rows, where every row consists of the row number in the  hatespeech_text_label_vote_RESTRICTED_100K.csv file which is the  first occurrence of a Tweet text followed by comma-separated row numbers of all other occurrences of the same Tweet text in the same file. There are ~8K other occurrences.

Please cite the paper in any published work that uses any of these resources. 

    title={Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior}, 
    author={Founta, Antigoni-Maria and Djouvas, Constantinos and Chatzakou, Despoina and Leontiadis, Ilias and Blackburn, Jeremy and Stringhini, Gianluca and Vakali, Athena and Sirivianos, Michael and Kourtellis, Nicolas}, 
    booktitle={11th International Conference on Web and Social Media, ICWSM 2018}, 
    organization={AAAI Press} 

For any further questions contact a.m.founta at gmail dot com AND markos.charalambous at eecei dot cut dot ac dot cy  



The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

  1. You will not attempt to use this data to de-anonymize, in any way, any users in this or any other dataset.
  2. You will not re-share the dataset with anyone not included in this request.
  3. You will appropriately cite the "Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior" ICWSM 2018 paper in any publication, of any form and kind, using this data

You are currently not logged in. Do you have an account? Log in here

Additional details


ENCASE – EnhaNcing seCurity And privacy in the Social wEb: a user centered approach for the protection of minors 691025
European Commission