There is a newer version of the record available.

Published April 27, 2023 | Version v1
Dataset Open

Antisemitism on Twitter: A Dataset for Machine Learning and Text Analytics

  • 1. Indiana University Bloomington
  • 2. Technical University Berlin

Description

# Institute For the Study of Contemporary Antisemitism (ISCA) at Indiana University Dataset: 

 
The ISCA project has compiled this dataset using an annotation portal, which was used to label tweets as either antisemitic or non-antisemitic, among other labels. Please note that the annotation was done with live data, including images and the context, such as threads. The original data was sourced from annotationportal.com. 
 
# Content: 
This dataset contains 6,941 tweets that cover a wide range of topics common in conversations about Jews, Israel, and antisemitism between January 2019 and December 2021. The dataset is drawn from representative samples during this period with relevant keywords. 1,250 tweets (18%) meet the IHRA definition of antisemitic messages.  

The dataset has been compiled within the ISCA project using an annotation portal to label tweets as either antisemitic or non-antisemitic. The original data was sourced from annotationportal.com. 

The tweets' distribution of all messages by year is as follows: 1,499 (22%) from 2019, 3,716 (54%) from 2020, and 1,726 (25%) from 2021. 4,605 (66%) contain the keyword "Jews," 1,524 (22%) include "Israel," 529 (8%) feature the derogatory term "ZioNazi*," and 283 (4%) use the slur "K---s." Some tweets may contain multiple keywords. 

483 out of the 4,605 tweets with the keyword "Jews" (11%) and 203 out of the 1,524 tweets with the keyword "Israel" (13%) were classified as antisemitic. 97 out of the 283 tweets using the antisemitic slur "K---s" (34%) are antisemitic. Interestingly, many tweets featuring the slur "K---s" actually call out its usage. In contrast, the majority of tweets with the derogatory term "ZioNazi*" are antisemitic, with 467 out of 529 (88%) being classified as such. 

 

File Description: 

The dataset is provided in a csv file format, with each row representing a single tweet, including replies, quotes, and retweets. The file contains the following columns: 

 
‘TweetID’: Represents the tweet ID. 

‘Username’: Represents the username who published the tweet.  
‘Text’: Represents the full text of the tweet.  

‘CreateDate’: Represents the date the tweet was created.  

‘Biased’: Represents the labeled by our annotations if the tweet is antisemitic or non-antisemitic. 

‘Keyword’: Represents the keyword that was used in the query. The keyword can be in the text, including mentioned names, or the username.  

 

Licences 

Data is published under the terms of the "Creative Commons Attribution 4.0 International" licence (https://creativecommons.org/licenses/by/4.0) 

R code is published under the terms of the "MIT" licence (https://opensource.org/licenses/MIT) ‘ 

 

Acknowledgements 

We are grateful for the support of Indiana University’s Observatory on Social Media (OSoMe) (Davis et al. 2016) and the contributions and annotations of all team members in our Social Media & Hate Research Lab at Indiana University’s Institute for the Study of Contemporary Antisemitism, especially Grace Bland, Elisha S. Breton, Kathryn Cooper, Robin Forstenhäusler, Sophie von Máriássy, Mabel Poindexter, Jenna Solomon, Clara Schilling, and Victor Tschiskale. 

This work used Jetstream2 at Indiana University through allocation HUM200003 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296.

Files

GoldStander_Dataset.zip

Files (752.4 kB)

Name Size Download all
md5:31dd1da81f871f0440db7330df38ea07
752.4 kB Preview Download

Additional details