A study on real graphs of fake news spreading on Twitter

Bodaghi, Amirhosein

doi:10.5281/zenodo.5225338

Published March 15, 2020 | Version v2

Dataset Open

A study on real graphs of fake news spreading on Twitter

Bodaghi, Amirhosein¹

1. Federal University of Rio de Janeiro

*** Fake News on Twitter ***

These 5 datasets are the results of an empirical study on the spreading process of newly fake news on Twitter. Particularly, we have focused on those fake news which have given rise to a truth spreading simultaneously against them. The story of each fake news is as follow:

1- FN1: A Muslim waitress refused to seat a church group at a restaurant, claiming "religious freedom" allowed her to do so.

2- FN2: Actor Denzel Washington said electing President Trump saved the U.S. from becoming an "Orwellian police state."

3- FN3: Joy Behar of "The View" sent a crass tweet about a fatal fire in Trump Tower.

4- FN4: The animated children's program 'VeggieTales' introduced a cannabis character in August 2018.

5- FN5: In September 2018, the University of Alabama football program ended its uniform contract with Nike, in response to Nike's endorsement deal with Colin Kaepernick.

The data collection has been done in two stages that each provided a new dataset: 1- attaining Dataset of Diffusion (DD) that includes information of fake news/truth tweets and retweets 2- Query of neighbors for spreaders of tweets that provides us with Dataset of Graph (DG).

DD

DD for each fake news story is an excel file, named FNx_DD where x is the number of fake news, and has the following structure:

The structure of excel files for each dataset is as follow:

Each row belongs to one captured tweet/retweet related to the rumor, and each column of the dataset presents a specific information about the tweet/retweet. These columns from left to right present the following information about the tweet/retweet:
User ID (user who has posted the current tweet/retweet)
The number of published tweet/retweet by the user at the time of posting the current tweet/retweet
Language of the tweet/retweet
Number of followers
Number of followings (friends)
Date and time of posting the current tweet/retweet
Number of like (favorite) the current tweet had been acquired before crawling it
Number of times the current tweet had been retweeted before crawling it
Is there any other tweet inside of the current tweet/retweet (for example this happens when the current tweet is a quote or reply or retweet)
The source (OS) of device by which the current tweet/retweet was posted
Tweet/Retweet ID
Retweet ID (if the post is a retweet then this feature gives the ID of the tweet that is retweeted by the current post)
Quote ID (if the post is a quote then this feature gives the ID of the tweet that is quoted by the current post)
Reply ID (if the post is a reply then this feature gives the ID of the tweet that is replied by the current post)
Frequency of tweet occurrences which means the number of times the current tweet is repeated in the dataset (for example the number of times that a tweet exists in the dataset in the form of retweet posted by others)
State of the tweet which can be one of the following forms (achieved by an agreement between the annotators):

r : The tweet/retweet is a fake news post
a : The tweet/retweet is a truth post
q : The tweet/retweet is a question about the fake news, however neither confirm nor deny it
n : The tweet/retweet is not related to the fake news (even though it contains the queries related to the rumor, but does not refer to the given fake news)

DG

DG for each fake news contains two files:

A file in graph format (.graph) which includes the information of graph such as who is linked to whom. (This file named FNx_DG.graph, where x is the number of fake news)
A file in Jsonl format (.jsonl) which includes the real user IDs of nodes in the graph file. (This file named FNx_Labels.jsonl, where x is the number of fake news)

Because in the graph file, the label of each node is the number of its entrance in the graph. For example if node with user ID 12345637 be the first node which has been entered into the graph file then its label in the graph is 0 and its real ID (12345637) would be at the row number 1 (because the row number 0 belongs to column labels) in the jsonl file and so on other node IDs would be at the next rows of the file (each row corresponds to 1 user id). Therefore, if we want to know for example what the user id of node 200 (labeled 200 in the graph) is, then in jsonl file we should look at row number 202.

The user IDs of spreaders in DG (those who have had a post in DD) would be available in DD to get extra information about them and their tweet/retweet. The other user IDs in DG are the neighbors of these spreaders and might not exist in DD.

Files

Files (606.9 MB)

Name	Size
FN1_DD.xlsx md5:d57d6f039871111e5746e6f461123d90	56.8 kB	Download
FN1_DG.graph md5:7c56c7e9b72e1f1a2391cd82af604d18	69.8 MB	Download
FN1_Labels.jsonl md5:dda1813ccccd07187699e2633f458eba	14.5 MB	Download
FN2_DD.xlsx md5:d6a3220506f2cce316f755d16638c078	104.5 kB	Download
FN2_DG.graph md5:25c6098ab3985a2369fbe49ed1990c94	109.7 MB	Download
FN2_Labels.jsonl md5:536b9fdfb15468507dc1c8008bf1827f	23.7 MB	Download
FN3_DD.xlsx md5:cc6cb1bf5ebdd27010e76cbef80227c5	70.8 kB	Download
FN3_DG.graph md5:79dac761c08c8b2adb7dbd66d4a788e0	135.0 MB	Download
FN3_Labels.jsonl md5:850be8666ea2e787b40b302683f0a216	36.6 MB	Download
FN4_DD.xlsx md5:08a28cb2500ab2446477980e6f863433	77.1 kB	Download
FN4_DG.graph md5:589807d14caf63835148ecf4bcbc7694	47.1 MB	Download
FN4_Labels.jsonl md5:565e68c7fa12d139d33f017ad10afa65	12.1 MB	Download
FN5_DD.xlsx md5:32a5c83a4672d5b6d6e975f4f9f74c97	124.3 kB	Download
FN5_DG.graph md5:3adb2ed8ceed78b27602b7ed0e33624b	137.7 MB	Download
FN5_Labels.jsonl md5:e853d5f2f19c72c7882884d7a47c2afc	20.3 MB	Download

	All versions	This version
Views	1,579	1,015
Downloads	1,456	1,403
Data volume	82.6 GB	75.7 GB

A study on real graphs of fake news spreading on Twitter

Authors/Creators

Description

Files

Files (606.9 MB)