Dataset for: "How over is it?" Understanding the Incel Community on YouTube
Creators
- 1. Cyprus University of Technology
- 2. Max Planck Institute
- 3. Binghamton University
- 4. University College London
- 5. Boston University
Description
Dataset for the paper: "How over is it?" Understanding the Incel Community on YouTube
Abstract: YouTube is by far the largest host of user-generated video content worldwide. Alas, the platform also hosts inappropriate, toxic, and hateful content. One community that has often been linked to sharing and publishing hateful and misogynistic content is the so-called Involuntary Celibates (Incels), a loosely defined movement ostensibly focusing on men's issues. In this paper, we set out to analyze the Incel community on YouTube by focusing on this community's evolution over the last decade and understanding whether YouTube's recommendation algorithm steers users towards Incel-related videos. We collect videos shared on Incel communities within Reddit and perform a data-driven characterization of the content posted on YouTube. Among other things, we find that the Incel community on YouTube is getting traction and that during the last decade, the number of Incel-related videos and comments rose substantially. We also find that users have a 6.3% chance of being suggested an Incel-related video by YouTube's recommendation algorithm within five hops when starting from a non-Incel-related video. Overall, our findings paint an alarming picture of online radicalization: not only Incel activity is increasing over time, but platforms may also play an active role in steering users towards such extreme content.
Dataset Files
The dataset consists of nine files, which include the metadata, comments, and captions of all the videos collected and analyzed in this paper (Incel-derived set, Control Set, Incel-derived Recommendation Graph, and Control Recommendation Graph), as well as the Incel Terms lexicon that we use in our video annotation methodology.
1. Video Metadata
- "incel_derived_groundtruth_videos.json": Contains the Incel-derived labeled ground-truth videos shared in Incel-related subreddits on Reddit. It includes 6,452 videos (290 Incel-related and 6,162 "Other") annotated following the video annotation methodology described in the paper.
- "control_groundtruth_videos.json": Contains the randomly selected YouTube videos shared in various subreddits on Reddit. It includes 5,793 videos (66 Incel-related and 5,727 "Other") annotated following the video annotation methodology described in the paper.
- "incel_derived_recommendation_graph_videos.json": Contains the 37.7K YouTube videos used to construct the Incel-derived recommendation graph. We have 1,074 Incel-related videos and 36,673 "Other" videos annotated following the video annotation methodology described in the paper.
- "control_recommendation _graph_videos.json": Contains the 29.3K YouTube videos used to construct the Control recommendation graph. We have 428 Incel-related videos and 28,866 "Other" videos annotated following the video annotation methodology described in the paper.
- Video Metadata Description:
- "annotation_label": The annotation label assigned to the video by our video annotation methodology.
- "isSeed": 0 if the video is a seed video in the recommendation graph, 1 if it is a recommended video of a seed video.
- "relatedVideos": The recommended videos of the given video as returned by the YouTube Data API.
2. Video Comments:
- "incel_derived_videos_comments.json": Includes the unique identifiers of the comments of the Incel-derived ground-truth and the Incel-derived Recommendation Graph videos.
- "control_videos_comments.json": Includes the unique identifiers of the comments of the Control ground-truth and the Control Recommendation Graph videos.
3. Video Transcripts:
- "incel_derived_videos_transcripts.json": Includes the captions of the Incel-derived ground-truth and the Incel-derived Recommendation Graph videos.
- "control_videos_transcripts.json": Includes the captions of the Control ground-truth and the Control Recommendation Graph videos.
4. Incel-related Terms Dictionary:
- "incel_related_terms_dictionary": It includes all the 200 terms of the Incel-related terms lexicon mentioned in the paper and used in our video annotation methodology.
If you use this dataset in any publication, of any form and kind, please cite using this data:
@article{papadamou2020understanding,
title={"How over is it?" Understanding the incel community on youtube},
author={Papadamou, Kostantinos and Zannettou, Savvas and Blackburn, Jeremy and De Cristofaro, Emiliano and Stringhini, Gianluca and Sirivianos, Michael},
journal={arXiv preprint arXiv:2001.08293},
year={2020}
}
Notes
Files
Additional details
Related works
- Is supplement to
- Conference paper: 10.5281/zenodo.4769412 (DOI)