Published January 31, 2020 | Version v1
Dataset Restricted

Dataset for: "Disturbed YouTube for Kids: Characterizing and Detecting Inappropriate Videos Targeting Young Children"

  • 1. Cyprus University of Technology
  • 2. University of Alabama at Birmingham
  • 3. Telefonica Research
  • 4. Boston University

Description

Dataset for paper: Disturbed YouTube for Kids: Characterizing and Detecting Inappropriate Videos Targeting Young Children

The dataset consists of five files:
1. groundtruth_videos.json: This is the ground truth dataset. We have 4797 manually annotated videos (1513 suitable, 929 disturbing, 419 restricted, and 1936 irrelevant). You can distinguish among the different labels by observing the 'classification_label' field.
2. elsagate_related_videos.json: Contains the data for 233K elsagate-related YouTube videos (1K seed and 232K recommended) that were obtained as described in the paper.
3. other_child_related_videos.json: Contains the data for 155K other child-related YouTube videos (2K seed and 153K recommended) that were obtained as described in the paper.
4. random_videos.json: Contains the data for 482K random YouTube videos (8K seed and 474K recommended) that were obtained as described in the paper.
5. popular_videos.json: Contains the data for 11K popular YouTube videos (500 seed and 10.5K recommended) that were obtained between November 18 and November 21, 2018, as described in the paper.

For each video in all sets, you can check the predicted label of our classifier by observing the 'prediction' field.

Notes

Acknowledgments: This project has received funding from the European Union's Horizon 2020 Research and Innovation program under the Marie Skłodowska-Curie ENCASE project (Grant Agreement No. 691025) and from the National Science Foundation under grant CNS-1942610.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

In order for me to share the dataset with you, please agree to the following terms:

  1. You will not to attempt to use this data to de-anonymize, in any way, any users in this or any other dataset.
  2. You will not re-share the dataset with anyone not included in this request.
  3. You will appropriately cite the "Disturbed YouTube for Kids: Characterizing and Detecting Inappropriate Videos Targeting Young Children" ICWSM paper in any publication, of any form and kind, using this data:

@inproceedings{papadamou2020disturbedyoutube,
    title={{Disturbed YouTube for Kids: Characterizing and Detecting Inappropriate Videos Targeting Young Children}},
    author={Papadamou, Kostantinos and Papasavva, Antonis and Zannettou, Savvas and Blackburn, Jeremy and Kourtellis, Nicolas and Leontiadis, Ilias and Stringhini, Gianluca and Sirivianos, Michael},
    booktitle={14th International AAAI Conference on Web and Social Media},
    year={2020},
    organization={AAAI}
}

You are currently not logged in. Do you have an account? Log in here

Additional details

Related works

Is supplement to
Conference paper: 10.5281/zenodo.3739061 (DOI)
Software: 10.5281/zenodo.4534217 (DOI)

Funding

ENCASE – EnhaNcing seCurity And privacy in the Social wEb: a user centered approach for the protection of minors 691025
European Commission