Preprint Open Access

News Sharing Networks Expose Information Polluters on Social Media

Truong, Bao Tran; Allen, Oliver Melbourne; Menczer, Filippo

The dataset includes networks constructed from COVID-related tweets for detecting low-credibility accounts.
We provide three CSV files, corresponding to three networks: the misinformation retweet, the bipartite news-sharing, and the news co-sharing network. All files are tab-separated edge lists; Node labels indicate their credibility: high, low, or unknown (0, 1, and -1 respectively).

An account score is a weighted mean of shared domain scores. Binary labels are provided for sources, rather than scores, to comply with Newsguard licensed usage. The paper uses a threshold of below 60 for low-quality accounts and sources, following Newsguard's convention. However, one can change this threshold depending on the use case. 
To apply the LoCred algorithm, the 'rt.csv' file can be used directly. To apply the other PageRank algorithms, the direction of the network needs to be reversed.
The train-test split procedure for evaluation is described in the paper.

More details can be found in our pre-print: https://arxiv.org/abs/2202.00094

Files (820.5 MB)
Name Size
infopolluters.zip
md5:370e858d6fb728a8ed7d5abfad75d087
820.5 MB Download
97
5
views
downloads
All versions This version
Views 9797
Downloads 55
Data volume 4.1 GB4.1 GB
Unique views 7070
Unique downloads 55

Share

Cite as