Dataset Open Access

Dataset for "Who Let The Trolls Out? Towards Understanding State-Sponsored Trolls"

Savvas Zannettou; Tristan Caulfield; William Setzer; Michael Sirivianos; Gianluca Stringhini; Jeremy Blackburn

This is the dataset used for the study "Who Let The Trolls Out? Towards Understanding State-Sponsored Trolls". Savvas Zannettou, Tristan Caulfield, William Setzer, Michael Sirivianos, Gianluca Stringhini, Jeremy Blackburn. Arxiv, 2019. DOI: 10.5281/zenodo.2558560

The dataset consists of the data released by Twitter on October 2018 for Russian and Iranian state-sponsored troll accounts, which is available at as well as intermediate data that we generated after processing the raw data.
For instance, we include trained Word2Vec and LDA models, the output of our influence estimation experiments via Hawkes Processes, and a lot of other data necessary to reproduce the results in the paper.
To use the provided data simply download the compressed file from <URL> and make sure that the uncompressed data folder is in the same directory as the IPython Notebook.

The code used for this study can be found here:

Please cite our paper if any publication, of any form and kind results of you using this data:

  title={Who let the trolls out? towards understanding state-sponsored trolls},
  author={Zannettou, Savvas and Caulfield, Tristan and Setzer, William and Sirivianos, Michael and Stringhini, Gianluca and Blackburn, Jeremy},
  journal={arXiv preprint arXiv:1811.03130},
Files (1.7 GB)
Name Size
1.7 GB Download
All versions This version
Views 292293
Downloads 872872
Data volume 1.5 TB1.5 TB
Unique views 261262
Unique downloads 145145


Cite as