Dataset Open Access

Dataset for "On the Origins of Memes by Means of Fringe Web Communities"

Savvas Zannettou; Tristan Caulfield; Jeremy Blackburn; Emiliano De Cristofaro; Michael Sirivianos; Gianluca Stringhini; Guillermo Suarez-Tangil

This dataset was collected with research funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No 691025.
The publication on which this dataset was used is: "On the Origins of Memes by Means of Fringe Web Communities". Savvas Zannettou, Tristan Caulfield, Jeremy Blackburn, Emiliano De Cristofaro, Michael Sirivianos, Gianluca Stringhini, and Guillermo Suarez-Tangil. ACM Internet Measurement Conference (IMC), 2018., DOI:


The dataset consists of all the URLs and phashes for images from Twitter, Reddit, 4chan's /pol/, and Gab posted between July 2016 and end of July 2017.

The code related to this research can be found here:, or here: 10.5281/zenodo.1463050

Presentation available here:

Files (5.3 GB)
Name Size
5.3 GB Download
All versions This version
Views 1,402611
Downloads 267139
Data volume 1.4 TB735.8 GB
Unique views 1,175520
Unique downloads 210117


Cite as