Published October 8, 2018 | Version v2
Dataset Open

Dataset for "On the Origins of Memes by Means of Fringe Web Communities"

  • 1. Cyprus University of Technology
  • 2. University College London
  • 3. University of Alabama at Birmingham
  • 4. Boston University
  • 5. King's College London


This dataset was collected with research funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No 691025.
The publication on which this dataset was used is: "On the Origins of Memes by Means of Fringe Web Communities". Savvas Zannettou, Tristan Caulfield, Jeremy Blackburn, Emiliano De Cristofaro, Michael Sirivianos, Gianluca Stringhini, and Guillermo Suarez-Tangil. ACM Internet Measurement Conference (IMC), 2018., DOI:


The dataset consists of all the URLs and phashes for images from Twitter, Reddit, 4chan's /pol/, and Gab posted between July 2016 and end of July 2017.

The code related to this research can be found here:, or here: 10.5281/zenodo.1463050

Presentation available here:


Files (5.3 GB)

Name Size Download all
5.3 GB Download

Additional details


ENCASE – EnhaNcing seCurity And privacy in the Social wEb: a user centered approach for the protection of minors 691025
European Commission