Published January 31, 2022 | Version 1
Preprint Open

News Sharing Networks Expose Information Polluters on Social Media

  • 1. Indiana University Bloomington, USA

Description

The dataset includes networks constructed from COVID-related tweets for detecting low-credibility accounts.
We provide three CSV files, corresponding to three networks: the misinformation retweet, the bipartite news-sharing, and the news co-sharing network. All files are tab-separated edge lists; Node labels indicate their credibility: high, low, or unknown (0, 1, and -1 respectively).

An account score is a weighted mean of shared domain scores. Binary labels are provided for sources, rather than scores, to comply with Newsguard licensed usage. The paper uses a threshold of below 60 for low-quality accounts and sources, following Newsguard's convention. However, one can change this threshold depending on the use case. 
To apply the LoCred algorithm, the 'rt.csv' file can be used directly. To apply the other PageRank algorithms, the direction of the network needs to be reversed.
The train-test split procedure for evaluation is described in the paper.

More details can be found in our pre-print: https://arxiv.org/abs/2202.00094

Files

infopolluters.zip

Files (820.5 MB)

Name Size Download all
md5:370e858d6fb728a8ed7d5abfad75d087
820.5 MB Preview Download