News Sharing Networks Expose Information Polluters on Social Media

Truong, Bao Tran; Allen, Oliver Melbourne; Menczer, Filippo

doi:10.5281/zenodo.5932514

Published January 31, 2022 | Version 1

Preprint Open

News Sharing Networks Expose Information Polluters on Social Media

1. Indiana University Bloomington, USA

The dataset includes networks constructed from COVID-related tweets for detecting low-credibility accounts.
We provide three CSV files, corresponding to three networks: the misinformation retweet, the bipartite news-sharing, and the news co-sharing network. All files are tab-separated edge lists; Node labels indicate their credibility: high, low, or unknown (0, 1, and -1 respectively).

An account score is a weighted mean of shared domain scores. Binary labels are provided for sources, rather than scores, to comply with Newsguard licensed usage. The paper uses a threshold of below 60 for low-quality accounts and sources, following Newsguard's convention. However, one can change this threshold depending on the use case.
To apply the LoCred algorithm, the 'rt.csv' file can be used directly. To apply the other PageRank algorithms, the direction of the network needs to be reversed.
The train-test split procedure for evaluation is described in the paper.

More details can be found in our pre-print: https://arxiv.org/abs/2202.00094

Files

infopolluters.zip

Files (820.5 MB)

Name	Size	Download all
infopolluters.zip md5:370e858d6fb728a8ed7d5abfad75d087	820.5 MB	Preview Download

	All versions	This version
Views	449	447
Downloads	69	69
Data volume	56.6 GB	56.6 GB

News Sharing Networks Expose Information Polluters on Social Media

Authors/Creators

Description

Files

infopolluters.zip

Files (820.5 MB)