Dataset Open Access
Savvas Zannettou; Barry Bradlyn; Emiliano De Cristofaro; Haewoon Kwak; Michael Sirivianos,; Gianluca Stringhini; Jeremy Blackburn
This dataset was used for this project: "What is Gab? A Bastion of Free Speech or an Alt-Right Echo Chamber?". Savvas Zannettou, Barry Bradlyn, Emiliano De Cristofaro, Michael Sirivianos, Gianluca Stringhini, Haewoon Kwak, Jeremy Blackburn. Workshop on Computational Methods in CyberSafety, Online Harassment and Misinformation, 2018. DOI: 10.1145/3184558.3191531.
In addition, this project has received funding from the European Union’s Horizon 2020 Research and Innovation program under the Marie Skłodowska-Curie ENCASE project (Grant Agreement No. 691025). The work reflects only the authors’ views; the Agency and the Commission are not responsible for any use that may be made of the information it contains.
Using Gab’s API, we crawl the social network using a snowball methodology. Specifically, we obtain data for the most popular users as returned by Gab’s API and iteratively collect data from all their followers as well as their followings. Subsequently, for all users in our dataset we collect all of their the posts. Overall, we collect 22,112,812 posts from 336,752 users, between August 2016 and January 2018. This dataset is a .json file and each line has one .json object.