Dataset Restricted Access

Dataset for: "How over is it?" Understanding the Incel Community on YouTube

Kostantinos Papadamou; Savvas Zannettou; Jeremy Blackburn; Emiliano De Cristofaro; Gianluca Stringhini; Michael Sirivianos

Dataset for the paper: "How over is it?" Understanding the Incel Community on YouTube

Abstract: YouTube is by far the largest host of user-generated video content worldwide. Alas, the platform also hosts inappropriate, toxic, and hateful content. One community that has often been linked to sharing and publishing hateful and misogynistic content is the so-called Involuntary Celibates (Incels), a loosely defined movement ostensibly focusing on men's issues. In this paper, we set out to analyze the Incel community on YouTube by focusing on this community's evolution over the last decade and understanding whether YouTube's recommendation algorithm steers users towards Incel-related videos. We collect videos shared on Incel communities within Reddit and perform a data-driven characterization of the content posted on YouTube. Among other things, we find that the Incel community on YouTube is getting traction and that during the last decade, the number of Incel-related videos and comments rose substantially. We also find that users have a 6.3% chance of being suggested an Incel-related video by YouTube's recommendation algorithm within five hops when starting from a non-Incel-related video. Overall, our findings paint an alarming picture of online radicalization: not only Incel activity is increasing over time, but platforms may also play an active role in steering users towards such extreme content.

Dataset Files

The dataset consists of nine files, which include the metadata, comments, and captions of all the videos collected and analyzed in this paper (Incel-derived set, Control Set, Incel-derived Recommendation Graph, and Control Recommendation Graph), as well as the Incel Terms lexicon that we use in our video annotation methodology.

1. Video Metadata

  • "incel_derived_groundtruth_videos.json": Contains the Incel-derived labeled ground-truth videos shared in Incel-related subreddits on Reddit. It includes 6,452 videos (290 Incel-related and 6,162 "Other") annotated following the video annotation methodology described in the paper.
  • "control_groundtruth_videos.json": Contains the randomly selected YouTube videos shared in various subreddits on Reddit. It includes 5,793 videos (66 Incel-related and 5,727 "Other") annotated following the video annotation methodology described in the paper.
  • "incel_derived_recommendation_graph_videos.json": Contains the 37.7K YouTube videos used to construct the Incel-derived recommendation graph. We have 1,074 Incel-related videos and 36,673 "Other" videos annotated following the video annotation methodology described in the paper.
  • "control_recommendation _graph_videos.json": Contains the 29.3K YouTube videos used to construct the Control recommendation graph. We have 428 Incel-related videos and 28,866 "Other" videos annotated following the video annotation methodology described in the paper.

- Video Metadata Description:

  • "annotation_label": The annotation label assigned to the video by our video annotation methodology.
  • "isSeed": 0 if the video is a seed video in the recommendation graph, 1 if it is a recommended video of a seed video.
  • "relatedVideos": The recommended videos of the given video as returned by the YouTube Data API.

2. Video Comments: 

  • "incel_derived_videos_comments.json": Includes the unique identifiers of the comments of the Incel-derived ground-truth and the Incel-derived Recommendation Graph videos.
  • "control_videos_comments.json": Includes the unique identifiers of the comments of the Control ground-truth and the Control Recommendation Graph videos.

3. Video Transcripts:

  • "incel_derived_videos_transcripts.json": Includes the captions of the Incel-derived ground-truth and the Incel-derived Recommendation Graph videos.
  • "control_videos_transcripts.json": Includes the captions of the Control ground-truth and the Control Recommendation Graph videos.

4. Incel-related Terms Dictionary:

  • "incel_related_terms_dictionary": It includes all the 200 terms of the Incel-related terms lexicon mentioned in the paper and used in our video annotation methodology.

If you use this dataset in any publication, of any form and kind, please cite using this data:

@article{papadamou2020understanding,
  title={"How over is it?" Understanding the incel community on youtube},
  author={Papadamou, Kostantinos and Zannettou, Savvas and Blackburn, Jeremy and De Cristofaro, Emiliano and Stringhini, Gianluca and Sirivianos, Michael},
  journal={arXiv preprint arXiv:2001.08293},
  year={2020}
}
Acknowledgments: This project has received funding from the European Union's Horizon 2020 Research and Innovation program under the Marie Sk\l{}dowska-Curie ENCASE project (GA No. 691025) and the CONCORDIA project (GA No. 830927), the US National Science Foundation (grants: 1942610, 2114407, 2114411, and 2046590), and the UK's National Research Centre on Privacy, Harm Reduction, and Adversarial Influence Online (UKRI grant: EP/V011189/1). This work reflects only the authors' views; the Agency and the Commission are not responsible for any use that may be made of the information it contains.
Restricted Access

You may request access to the files in this upload, provided that you fulfil the conditions below. The decision whether to grant/deny access is solely under the responsibility of the record owner.


In order to share the dataset with you, please agree to the following terms:

  1. You will not attempt to use this data to de-anonymize, in any way, any users in this or any other dataset.
  2. You will not re-share the dataset with anyone else not included in this request.
  3. You will appropriately cite the "Understanding the Incel Community on YouTube" paper in any publication, of any form and kind, using this data:
@article{papadamou2020understanding,
  title={"How over is it?" Understanding the incel community on youtube},
  author={Papadamou, Kostantinos and Zannettou, Savvas and Blackburn, Jeremy and De Cristofaro, Emiliano and Stringhini, Gianluca and Sirivianos, Michael},
  journal={arXiv preprint arXiv:2001.08293},
  year={2020}
}

 


85
16
views
downloads
All versions This version
Views 8585
Downloads 1616
Data volume 2.7 GB2.7 GB
Unique views 6060
Unique downloads 11

Share

Cite as