Dataset Restricted Access

Dataset for: "It is just a flu: Assessing the Effect of Watch History on YouTube's Pseudoscientific Video Recommendations"

Kostantinos Papadamou; Savvas Zannettou; Jeremy Blackburn; Emiliano De Cristofaro; Gianluca Stringhini; Michael Sirivianos


JSON-LD (schema.org) Export

{
  "description": "<p><strong>Dataset for the paper: &quot;It is just a flu: Assessing the Effect of Watch History on YouTube&rsquo;s Pseudoscientific Video Recommendations&quot;</strong></p>\n\n<p><strong>Abstract:</strong>&nbsp;</p>\n\n<p>The role played by YouTube&rsquo;s recommendation algorithm in unwittingly promoting misinformation and conspiracy theories is not entirely understood.&nbsp;Yet, this can have dire real-world consequences, especially when pseudoscientific content is promoted to users at critical times, such as the COVID-19 pandemic.&nbsp;In this paper, we set out to characterize and detect pseudoscientific misinformation on YouTube. We collect 6.6K videos related to COVID-19, the Flat Earth theory, as well as the anti-vaccination and anti-mask movements.&nbsp;Using crowdsourcing, we annotate them as pseudoscience, legitimate science, or irrelevant and train a deep learning classifier to detect pseudoscientific videos with an accuracy of 0.79.</p>\n\n<p>We quantify user exposure to this content on various parts of the platform and how this exposure changes based on the user&rsquo;s watch history. We find that YouTube suggests more pseudoscientific content regarding traditional pseudoscientific topics (e.g., flat earth, anti-vaccination) than for emerging ones (like COVID-19).&nbsp;At the same time, these recommendations are more common on the search results page than on a user&rsquo;s homepage or in the recommendation section when actively watching videos. Finally, we shed light on how a user&rsquo;s watch history substantially affects the type of recommended videos.</p>\n\n<p>&nbsp;</p>\n\n<p><strong>Dataset Files</strong></p>\n\n<p>The dataset consists of three files: the metadata, comments, and captions of the ground-truth dataset&nbsp;videos collected and manually reviewed in this paper.</p>\n\n<p><strong>1. Video Metadata</strong></p>\n\n<ul>\n\t<li><strong>&quot;groundtruth_videos.json&quot;:</strong>&nbsp;Contains the metadata of our manually reviewed ground-truth dataset videos. The ground-truth dataset includes 1,197 science, 1,325 pseudoscience, and 3,212 irrelevant videos. More specifically, it includes the metadata of videos related to the following pseudoscientific topics:\n\n\t<ul>\n\t\t<li>COVID-19: (607 science, 368 pseudoscience, 721 irrelevant videos)</li>\n\t\t<li>Anti-vaccination (363 science, 394 pseudoscience, and 1,060 irrelevant videos)</li>\n\t\t<li>Anti-mask (65 science, 188 pseudoscience, and 724 irrelevant videos)</li>\n\t\t<li>Flat Earth (162 science, 375 pseudoscience, and 707 irrelevant videos)</li>\n\t</ul>\n\t</li>\n</ul>\n\n<p>Note,&nbsp;that 600 of the videos in this dataset include the <em><strong>&quot;annotation.manual_review_label&quot;</strong></em>&nbsp;attribute,&nbsp;which is the label assigned by the first author of this paper to evaluate the performance of the crowdsourced annotation process.</p>\n\n<p><strong>- Video Metadata Description:</strong></p>\n\n<ul>\n\t<li><em>&quot;search_term&quot;</em>: The search terms used to search YouTube and retrieve these videos during our data collection. It can be one of the following search terms: &#39;covid-19&#39;, &#39;coronavirus&#39;, &#39;anti-vaccination&#39;, &#39;anti-vaxx&#39;, &#39;anti-mask&#39;, or &#39;flat earth&#39;.</li>\n\t<li><em>&quot;annotation.annotations&quot;</em>: The list of the three annotations assigned to each video by our crowdsourced annotators.</li>\n\t<li><em>&quot;annotation.label&quot;</em>: The annotation label assigned to the video based on the majority agreement of the crowdsourced annotators.</li>\n\t<li><em>&quot;annotation.manual_review_label&quot;</em>: The label assigned by the first author of this paper to evaluate the performance of the crowdsourced annotation process.</li>\n\t<li>&quot;isSeed&quot;: 0 if the video is a seed video of our data collection, 1 if it is a recommended video of a seed video.</li>\n\t<li><em>&quot;relatedVideos&quot;</em>: The recommended videos of the given video as returned by the YouTube Data API.</li>\n</ul>\n\n<p><strong>2. Video Comments:&nbsp;</strong></p>\n\n<ul>\n\t<li><strong>&quot;groundtruth_videos_comments_ids.json&quot;:</strong>&nbsp;Includes the identifiers of the comments of our ground-truth videos.</li>\n</ul>\n\n<p><strong>3. Video Transcripts:</strong></p>\n\n<ul>\n\t<li><strong>&quot;groundtruth_videos_transcripts.json&quot;:</strong> Includes the captions of our ground-truth videos.</li>\n</ul>\n\n<p>If you use this dataset in any publication, of any form and kind, please cite using this data.</p>\n\n<pre><code>@article{papadamou2020just,\n    title={'It is just a flu': Assessing the Effect of Watch History on YouTube's Pseudoscientific Video Recommendations},\n    author={Papadamou, Kostantinos and Zannettou, Savvas and Blackburn, Jeremy and De Cristofaro, Emiliano and Stringhini, Gianluca and Sirivianos, Michael},\n    journal={arXiv preprint arXiv:2010.11638},\n    year={2020}\n}</code></pre>", 
  "creator": [
    {
      "affiliation": "Cyprus University of Technology", 
      "@type": "Person", 
      "name": "Kostantinos Papadamou"
    }, 
    {
      "affiliation": "Max Planck Institute", 
      "@type": "Person", 
      "name": "Savvas Zannettou"
    }, 
    {
      "affiliation": "Binghamton University", 
      "@type": "Person", 
      "name": "Jeremy Blackburn"
    }, 
    {
      "affiliation": "University College London", 
      "@type": "Person", 
      "name": "Emiliano De Cristofaro"
    }, 
    {
      "affiliation": "Boston University", 
      "@type": "Person", 
      "name": "Gianluca Stringhini"
    }, 
    {
      "affiliation": "Cyprus University of Technology", 
      "@type": "Person", 
      "name": "Michael Sirivianos"
    }
  ], 
  "url": "https://zenodo.org/record/4769731", 
  "datePublished": "2021-02-24", 
  "keywords": [
    "YouTube", 
    "YouTube Videos", 
    "YouTube's Recommendation Algorithm", 
    "Science", 
    "Pseudoscience", 
    "Pseudoscientific Misinformation", 
    "Watch History", 
    "COVID-19", 
    "Anti-vaccination", 
    "Anti-mask", 
    "Flat Earth"
  ], 
  "@context": "https://schema.org/", 
  "identifier": "https://doi.org/10.5281/zenodo.4769731", 
  "@id": "https://doi.org/10.5281/zenodo.4769731", 
  "@type": "Dataset", 
  "name": "Dataset for: \"It is just a flu: Assessing the Effect of Watch History on YouTube's Pseudoscientific Video Recommendations\""
}
184
25
views
downloads
All versions This version
Views 184134
Downloads 2525
Data volume 850.0 MB850.0 MB
Unique views 141106
Unique downloads 77

Share

Cite as