SoK: Machine Learning for Misinformation Detection
Creators
Contributors
Data curator:
Description
Annotations and replication materials for 'SoK: Machine Learning for Misinformation Detection'
I've included descriptions of file contents below.
annotations_aec.tsv: Contains annotations for our full paper corpus, comprising 248 published works. We annotated these papers for target, dataset curation, model choice, feature selection, and evaluation.
paper_selection_criteria.txt: Our criteria for assembling the full and focus coding sets, adapted from pages 3, 5 ('Paper selection') and 6.
replications.zip: within this zip archive, you'll find three subfolders, each corresponding to one of the three replication analyses found on pages 11-13 of the manuscript. We've included the subsection header in the manuscript where each dataset / codebase is discussed:
- articles (5.1): includes original and modified Reuters and NYTimes texts and accompanying labels (these are new datasets that we introduced for the sake of robustness testing). Also includes FA-KES and ISOT datasets and classifier (new_RNN_CNN.py) used by the original study authors and their classifier.
- users (5.2): includes troll and non-troll summary statistics, by account, with accompanying label. Also includes the classifier used by the original study author.
- sources (5.3): includes splits, classifier, and datasets used by the original author.
Notes on open-source availability for each codebase: the source-scoped replication code is freely available online. We received permission from the authors of the article-scoped study to open-source their code. We've previously contacted the author of the user-scoped work (TrollMagnifier) and have not received a response -- we are sharing their code here, for the sake of artifact evaluation; open-source availability is pending an affirmative response from the author.
Files
paper_selection_criteria.txt
Additional details
Dates
- Available
-
2025-06-07