Dataset Restricted Access

Dataset for: "How over is it?" Understanding the Incel Community on YouTube

Kostantinos Papadamou; Savvas Zannettou; Jeremy Blackburn; Emiliano De Cristofaro; Gianluca Stringhini; Michael Sirivianos


JSON Export

{
  "owners": [
    34839
  ], 
  "doi": "10.5281/zenodo.4557039", 
  "stats": {
    "version_unique_downloads": 2.0, 
    "unique_views": 78.0, 
    "views": 105.0, 
    "version_views": 105.0, 
    "unique_downloads": 2.0, 
    "version_unique_views": 78.0, 
    "volume": 4209468576.0, 
    "version_downloads": 26.0, 
    "downloads": 26.0, 
    "version_volume": 4209468576.0
  }, 
  "links": {
    "conceptbadge": "https://zenodo.org/badge/doi/10.5281/zenodo.4557038.svg", 
    "doi": "https://doi.org/10.5281/zenodo.4557039", 
    "conceptdoi": "https://doi.org/10.5281/zenodo.4557038", 
    "latest_html": "https://zenodo.org/record/4557039", 
    "badge": "https://zenodo.org/badge/doi/10.5281/zenodo.4557039.svg", 
    "html": "https://zenodo.org/record/4557039", 
    "latest": "https://zenodo.org/api/records/4557039"
  }, 
  "conceptdoi": "10.5281/zenodo.4557038", 
  "created": "2021-05-18T08:19:27.489410+00:00", 
  "updated": "2021-08-22T13:03:04.458884+00:00", 
  "conceptrecid": "4557038", 
  "revision": 6, 
  "id": 4557039, 
  "metadata": {
    "access_right_category": "danger", 
    "doi": "10.5281/zenodo.4557039", 
    "description": "<p><strong>Dataset for the paper: &quot;How over is it?&quot; Understanding the Incel Community on YouTube</strong></p>\n\n<p><strong>Abstract:</strong>&nbsp;YouTube is by far the largest host of user-generated video content worldwide.&nbsp;Alas, the platform also hosts inappropriate, toxic, and hateful content.&nbsp;One community that has often been linked to sharing and publishing hateful and misogynistic content is the so-called Involuntary Celibates (Incels), a loosely defined movement ostensibly focusing on men&#39;s issues.&nbsp;In this paper, we set out to analyze the Incel community on YouTube by focusing on this community&#39;s evolution over the last decade and understanding whether YouTube&#39;s recommendation algorithm steers users towards Incel-related videos.&nbsp;We collect videos shared on Incel communities within Reddit and perform a data-driven characterization of the content posted on YouTube.&nbsp;Among other things, we find that the Incel community on YouTube is getting traction and that during the last decade, the number of Incel-related videos and comments rose substantially.&nbsp;We also find that users have a 6.3% chance of being suggested an Incel-related video by YouTube&#39;s recommendation algorithm within five hops when starting from a non-Incel-related video.&nbsp;Overall, our findings paint an alarming picture of online radicalization: not only Incel activity is increasing over time, but platforms may also play an active role in steering users towards such extreme content.</p>\n\n<p><strong>Dataset Files</strong></p>\n\n<p>The dataset consists of nine&nbsp;files, which include the metadata, comments, and captions of all the videos collected and analyzed in this paper (Incel-derived set, Control Set, Incel-derived Recommendation Graph, and Control Recommendation Graph), as well as the Incel Terms lexicon that we use in our video annotation methodology.</p>\n\n<p><strong>1. Video Metadata</strong></p>\n\n<ul>\n\t<li><strong>&quot;incel_derived_groundtruth_videos.json&quot;:</strong>&nbsp;Contains the Incel-derived labeled ground-truth videos shared in Incel-related subreddits on Reddit. It includes 6,452 videos (290 Incel-related and 6,162 &quot;Other&quot;) annotated following the video annotation methodology described in the paper.</li>\n\t<li><strong>&quot;control_groundtruth_videos.json&quot;:</strong> Contains the randomly selected YouTube videos shared in various subreddits on Reddit. It includes 5,793 videos (66 Incel-related and 5,727 &quot;Other&quot;)&nbsp;annotated following the video annotation methodology described in the paper.</li>\n\t<li><strong>&quot;incel_derived_recommendation_graph_videos.json&quot;:</strong> Contains the 37.7K YouTube videos used to construct the Incel-derived recommendation graph. We have 1,074 Incel-related videos and 36,673 &quot;Other&quot; videos annotated&nbsp;following the video annotation methodology described in the paper.</li>\n\t<li><strong>&quot;control_recommendation _graph_videos.json&quot;:</strong>&nbsp;Contains the 29.3K YouTube videos used to construct the Control recommendation graph. We have 428 Incel-related videos and 28,866 &quot;Other&quot; videos annotated following the video annotation methodology described in the paper.</li>\n</ul>\n\n<p><strong>- Video Metadata Description:</strong></p>\n\n<ul>\n\t<li><em>&quot;annotation_label&quot;</em>: The annotation label assigned to the video by&nbsp;our video annotation methodology.</li>\n\t<li><em>&quot;isSeed&quot;</em>: 0 if the video is a seed video in the recommendation graph, 1 if it is a recommended video of a seed video.</li>\n\t<li><em>&quot;relatedVideos&quot;</em>: The recommended videos of the given video as returned by the YouTube Data API.</li>\n</ul>\n\n<p><strong>2. Video Comments:&nbsp;</strong></p>\n\n<ul>\n\t<li><strong>&quot;incel_derived_videos_comments.json&quot;:</strong>&nbsp;Includes the unique identifiers of the comments of the Incel-derived ground-truth&nbsp;and the Incel-derived Recommendation Graph videos.</li>\n\t<li><strong>&quot;control_videos_comments.json&quot;:</strong>&nbsp;Includes the unique identifiers of the comments of the Control ground-truth and the Control Recommendation Graph videos.</li>\n</ul>\n\n<p><strong>3. Video Transcripts:</strong></p>\n\n<ul>\n\t<li><strong>&quot;incel_derived_videos_transcripts.json&quot;:</strong> Includes the captions of&nbsp;the Incel-derived ground-truth&nbsp;and the Incel-derived Recommendation Graph videos.</li>\n\t<li><strong>&quot;control_videos_transcripts.json&quot;:</strong> Includes the captions of the Control ground-truth and the Control Recommendation Graph videos.</li>\n</ul>\n\n<p><strong>4. Incel-related Terms Dictionary:</strong></p>\n\n<ul>\n\t<li><strong>&quot;incel_related_terms_dictionary&quot;:</strong> It includes all the 200 terms of the Incel-related terms lexicon mentioned in the paper and used in our video annotation methodology.</li>\n</ul>\n\n<p>If you use this dataset in any publication, of any form and kind, please cite using this data:</p>\n\n<pre><code>@article{papadamou2020understanding,\n  title={\"How over is it?\" Understanding the incel community on youtube},\n  author={Papadamou, Kostantinos and Zannettou, Savvas and Blackburn, Jeremy and De Cristofaro, Emiliano and Stringhini, Gianluca and Sirivianos, Michael},\n  journal={arXiv preprint arXiv:2001.08293},\n  year={2020}\n}</code></pre>", 
    "title": "Dataset for: \"How over is it?\" Understanding the Incel Community on YouTube", 
    "notes": "Acknowledgments: This project has received funding from the European Union's Horizon 2020 Research and Innovation program under the Marie Sk\\l{}dowska-Curie ENCASE project (GA No. 691025) and the CONCORDIA project (GA No. 830927), the US National Science Foundation (grants: 1942610, 2114407, 2114411, and 2046590), and the UK's National Research Centre on Privacy, Harm Reduction, and Adversarial Influence Online (UKRI grant: EP/V011189/1). This work reflects only the authors' views; the Agency and the Commission are not responsible for any use that may be made of the information it contains.", 
    "relations": {
      "version": [
        {
          "count": 1, 
          "index": 0, 
          "parent": {
            "pid_type": "recid", 
            "pid_value": "4557038"
          }, 
          "is_last": true, 
          "last_child": {
            "pid_type": "recid", 
            "pid_value": "4557039"
          }
        }
      ]
    }, 
    "access_conditions": "<p>In order to share the&nbsp;dataset&nbsp;with you, please agree to the following terms:</p>\n\n<ol>\n\t<li>You will not attempt to use this data to de-anonymize, in any way, any users in this or any other&nbsp;dataset.</li>\n\t<li>You will not re-share the&nbsp;dataset&nbsp;with anyone else not included in this request.</li>\n\t<li>You will appropriately cite the &quot;Understanding the Incel Community on YouTube&quot; paper in any publication, of any form and kind, using this data:</li>\n</ol>\n\n<pre><code>@article{papadamou2020understanding,\n  title={\"How over is it?\" Understanding the incel community on youtube},\n  author={Papadamou, Kostantinos and Zannettou, Savvas and Blackburn, Jeremy and De Cristofaro, Emiliano and Stringhini, Gianluca and Sirivianos, Michael},\n  journal={arXiv preprint arXiv:2001.08293},\n  year={2020}\n}</code></pre>\n\n<p>&nbsp;</p>", 
    "grants": [
      {
        "code": "830927", 
        "links": {
          "self": "https://zenodo.org/api/grants/10.13039/501100000780::830927"
        }, 
        "title": "Cyber security cOmpeteNce fOr Research anD Innovation", 
        "acronym": "CONCORDIA", 
        "program": "H2020", 
        "funder": {
          "doi": "10.13039/501100000780", 
          "acronyms": [], 
          "name": "European Commission", 
          "links": {
            "self": "https://zenodo.org/api/funders/10.13039/501100000780"
          }
        }
      }, 
      {
        "code": "691025", 
        "links": {
          "self": "https://zenodo.org/api/grants/10.13039/501100000780::691025"
        }, 
        "title": "EnhaNcing seCurity And privacy in the Social wEb: a user centered approach for the protection of minors", 
        "acronym": "ENCASE", 
        "program": "H2020", 
        "funder": {
          "doi": "10.13039/501100000780", 
          "acronyms": [], 
          "name": "European Commission", 
          "links": {
            "self": "https://zenodo.org/api/funders/10.13039/501100000780"
          }
        }
      }
    ], 
    "keywords": [
      "YouTube", 
      "YouTube Videos", 
      "YouTube's Recommendation Algorithm", 
      "Involuntary Celibates", 
      "Incels", 
      "Incel-related Videos", 
      "Dataset"
    ], 
    "publication_date": "2021-03-22", 
    "creators": [
      {
        "affiliation": "Cyprus University of Technology", 
        "name": "Kostantinos Papadamou"
      }, 
      {
        "affiliation": "Max Planck Institute", 
        "name": "Savvas Zannettou"
      }, 
      {
        "affiliation": "Binghamton University", 
        "name": "Jeremy Blackburn"
      }, 
      {
        "affiliation": "University College London", 
        "name": "Emiliano De Cristofaro"
      }, 
      {
        "affiliation": "Boston University", 
        "name": "Gianluca Stringhini"
      }, 
      {
        "affiliation": "Cyprus University of Technology", 
        "name": "Michael Sirivianos"
      }
    ], 
    "access_right": "restricted", 
    "resource_type": {
      "type": "dataset", 
      "title": "Dataset"
    }, 
    "related_identifiers": [
      {
        "scheme": "doi", 
        "identifier": "10.5281/zenodo.4769412", 
        "relation": "isSupplementTo", 
        "resource_type": "publication-conferencepaper"
      }, 
      {
        "scheme": "doi", 
        "identifier": "10.5281/zenodo.4557038", 
        "relation": "isVersionOf"
      }
    ]
  }
}
105
26
views
downloads
All versions This version
Views 105105
Downloads 2626
Data volume 4.2 GB4.2 GB
Unique views 7878
Unique downloads 22

Share

Cite as