SoK: Machine Learning for Misinformation Detection

Xiao, Madelyne

doi:10.5281/zenodo.15613696

Published June 7, 2025 | Version v1

Dataset Open

SoK: Machine Learning for Misinformation Detection

Xiao, Madelyne

Contributors

Data curator:

Xiao, Madelyne

Annotations and replication materials for 'SoK: Machine Learning for Misinformation Detection'

I've included descriptions of file contents below.

annotations_aec.tsv: Contains annotations for our full paper corpus, comprising 248 published works. We annotated these papers for target, dataset curation, model choice, feature selection, and evaluation.

paper_selection_criteria.txt: Our criteria for assembling the full and focus coding sets, adapted from pages 3, 5 ('Paper selection') and 6.

replications.zip: within this zip archive, you'll find three subfolders, each corresponding to one of the three replication analyses found on pages 11-13 of the manuscript. We've included the subsection header in the manuscript where each dataset / codebase is discussed:

articles (5.1): includes original and modified Reuters and NYTimes texts and accompanying labels (these are new datasets that we introduced for the sake of robustness testing). Also includes FA-KES and ISOT datasets and classifier (new_RNN_CNN.py) used by the original study authors and their classifier.
users (5.2): includes troll and non-troll summary statistics, by account, with accompanying label. Also includes the classifier used by the original study author.
sources (5.3): includes splits, classifier, and datasets used by the original author.

Notes on open-source availability for each codebase: the source-scoped replication code is freely available online. We received permission from the authors of the article-scoped study to open-source their code. We've previously contacted the author of the user-scoped work (TrollMagnifier) and have not received a response -- we are sharing their code here, for the sake of artifact evaluation; open-source availability is pending an affirmative response from the author.

Files

paper_selection_criteria.txt

Files (1.2 MB)

Name	Size	Download all
annotations_aec.tsv md5:174aea89bfa98f6953d982d3833a03a5	179.0 kB	Download
paper_selection_criteria.txt md5:6cc3c374c2709f0f180473f9d70ca6c6	2.7 kB	Preview Download
replications.zip md5:6e7388122e814f2b20eb72485529680e	972.1 kB	Preview Download

Additional details

Available: 2025-06-07

	All versions	This version
Views	204	204
Downloads	96	96
Data volume	32.6 MB	32.6 MB

SoK: Machine Learning for Misinformation Detection

Authors/Creators

Contributors

Data curator:

Description

Files

paper_selection_criteria.txt

Files (1.2 MB)

Additional details

Dates