Dataset Open Access

Dataset for "Who Let The Trolls Out? Towards Understanding State-Sponsored Trolls"

Savvas Zannettou; Tristan Caulfield; William Setzer; Michael Sirivianos; Gianluca Stringhini; Jeremy Blackburn

This is the dataset used for the study "Who Let The Trolls Out? Towards Understanding State-Sponsored Trolls". Savvas Zannettou, Tristan Caulfield, William Setzer, Michael Sirivianos, Gianluca Stringhini, Jeremy Blackburn. Arxiv, 2019. DOI: 10.5281/zenodo.2558560

The dataset consists of the data released by Twitter on October 2018 for Russian and Iranian state-sponsored troll accounts, which is available at as well as intermediate data that we generated after processing the raw data.
For instance, we include trained Word2Vec and LDA models, the output of our influence estimation experiments via Hawkes Processes, and a lot of other data necessary to reproduce the results in the paper.
To use the provided data simply download the compressed file from <URL> and make sure that the uncompressed data folder is in the same directory as the IPython Notebook.

The code used for this study can be found here:

Please cite our paper if any publication, of any form and kind results of you using this data:

  title={Who let the trolls out? towards understanding state-sponsored trolls},
  author={Zannettou, Savvas and Caulfield, Tristan and Setzer, William and Sirivianos, Michael and Stringhini, Gianluca and Blackburn, Jeremy},
  journal={arXiv preprint arXiv:1811.03130},
Files (1.7 GB)
Name Size
1.7 GB Download
All versions This version
Views 436437
Downloads 885885
Data volume 1.5 TB1.5 TB
Unique views 401402
Unique downloads 158158


Cite as