There is a newer version of the record available.

Published March 4, 2021 | Version v1
Software Open

Software for: "It is just a flu: Assessing the Effect of Watch History on YouTube's Pseudoscientific Video Recommendations"

  • 1. Cyprus University of Technology
  • 2. Max Planck Institute
  • 3. Binghamton University
  • 4. University College London
  • 5. Boston University

Description

Abstract: The role played by YouTube's recommendation algorithm in unwittingly promoting misinformation and conspiracy theories is not entirely understood. Yet, this can have dire real-world consequences, especially when pseudoscientific content is promoted to users at critical times, such as the COVID-19 pandemic. In this paper, we set out to characterize and detect pseudoscientific misinformation on YouTube. We collect 6.6K videos related to COVID-19, the Flat Earth theory, as well as the anti-vaccination and anti-mask movements. Using crowdsourcing, we annotate them as pseudoscience, legitimate science, or irrelevant and train a deep learning classifier to detect pseudoscientific videos with an accuracy of 0.79. We quantify user exposure to this content on various parts of the platform and how this exposure changes based on the user's watch history. We find that YouTube suggests more pseudoscientific content regarding traditional pseudoscientific topics (e.g., flat earth, anti-vaccination) than for emerging ones (like COVID-19). At the same time, these recommendations are more common on the search results page than on a user's homepage or when actively watching videos. Finally, we shed light on how a user's watch history substantially affects the type of recommended videos.

Preprint available here.

What do we offer in this software?

We make publicly available to the research community, as well as the open-source community, the following tools, and libraries:

  1. The codebase of a Deep Learning Classifier for pseudoscientific videos detection on YouTube, and examples on how to train and test it;

  2. A library that simplifies the usage of the trained classifier and implements all the required tasks for the classification of YouTube videos;

  3. An open-source library that provides a unified framework for assessing the effects of personalization on YouTube video recommendations in multiple parts of the platform: a) the homepage; b) the search results page; and c) the video recommendations section (recommendations when watching videos).

The codebase is also available on GitHub.

If you make use of any modules available in this codebase in your work, please cite the following paper:

@article{papadamou2020just,
  title={"It is just a flu": Assessing the Effect of Watch History on YouTube's Pseudoscientific Video Recommendations},
  author={Papadamou, Kostantinos and Zannettou, Savvas and Blackburn, Jeremy and De Cristofaro, Emiliano and Stringhini, Gianluca and Sirivianos, Michael},
  journal={arXiv preprint arXiv:2010.11638},
  year={2020}
}

 

Notes

Acknowledgments: This project has received funding from the European Union's Horizon 2020 Research and Innovation program under the Marie Skłodowska-Curie ENCASE project (Grant Agreement No. 691025) and from the National Science Foundation under grant CNS-1942610.

Files

pseudoscience-youtube-paper-codebase.zip

Files (321.3 kB)

Name Size Download all
md5:db45c639e41457f06b85740e99a9e2b1
321.3 kB Preview Download

Additional details

Related works

Is supplement to
Preprint: 10.5281/zenodo.4484045 (DOI)
Is supplemented by
Dataset: 10.5281/zenodo.4558469 (DOI)

Funding

CONCORDIA – Cyber security cOmpeteNCe fOr Research anD InnovAtion 830927
European Commission