Preprint Open Access
Truong, Bao Tran; Allen, Oliver Melbourne; Menczer, Filippo
The dataset includes networks constructed from COVID-related tweets for detecting low-credibility accounts.
We provide three CSV files, corresponding to three networks: the misinformation retweet, the bipartite news-sharing, and the news co-sharing network. All files are tab-separated edge lists; Node labels indicate their credibility: high, low, or unknown (0, 1, and -1 respectively).
An account score is a weighted mean of shared domain scores. Binary labels are provided for sources, rather than scores, to comply with Newsguard licensed usage. The paper uses a threshold of below 60 for low-quality accounts and sources, following Newsguard's convention. However, one can change this threshold depending on the use case.
To apply the LoCred algorithm, the 'rt.csv' file can be used directly. To apply the other PageRank algorithms, the direction of the network needs to be reversed.
The train-test split procedure for evaluation is described in the paper.
More details can be found in our pre-print: https://arxiv.org/abs/2202.00094
|All versions||This version|
|Data volume||4.1 GB||4.1 GB|