There is a newer version of this record available.

Dataset Closed Access

Dataset for "On the Origins of Memes by Means of Fringe Web Communities"

Savvas Zannettou; Tristan Caulfield; Jeremy Blackburn; Emiliano De Cristofaro; Michael Sirivianos; Gianluca Stringhini; Guillermo Suarez-Tangil

This dataset is obsolete because the Twitter pHashes were incompatible with all the other pHashes in the dataset due to a version mismatch of the ImageHash Python library.

Please download the updated dataset here: https://zenodo.org/record/3699670

 

This dataset was collected with research funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No 691025.
The publication on which this dataset was used is: "On the Origins of Memes by Means of Fringe Web Communities". Savvas Zannettou, Tristan Caulfield, Jeremy Blackburn, Emiliano De Cristofaro, Michael Sirivianos, Gianluca Stringhini, and Guillermo Suarez-Tangil. ACM Internet Measurement Conference (IMC), 2018., DOI: https://doi.org/10.5281/zenodo.1323551

 

The dataset consists of all the URLs and phashes for images from Twitter, Reddit, 4chan's /pol/, and Gab posted between July 2016 and end of July 2017.

The code related to this research can be found here: https://github.com/memespaper/memes_pipeline, or here: https://doi.org/ 10.5281/zenodo.1463050

Presentation available here: https://doi.org/10.5281/zenodo.1477082

Closed Access

Files are not publicly accessible.

863
168
views
downloads
All versions This version
Views 863650
Downloads 168124
Data volume 888.4 GB655.5 GB
Unique views 738592
Unique downloads 13393

Share

Cite as