Dataset and Supplementary Tables on Retracted Articles Referenced in YouTube Videos (TPDL 2025)

Kikkawa, Jiro; Takaku, Masao

doi:10.5281/zenodo.15377209

Published May 10, 2025 | Version v1

Dataset Restricted

Dataset and Supplementary Tables on Retracted Articles Referenced in YouTube Videos (TPDL 2025)

1. Wakayama University
2. University of Tsukuba

This dataset and supplementary tables are released in conjunction with the TPDL 2025 paper titled “How Retracted Research Persists on YouTube: Retraction Severity, Visibility, and Disclosure.” They provide detailed information used in the analysis to promote transparency, ensure reproducibility, and facilitate future studies on scholarly communication and retractions.

The dataset contains the following files:

Filename	Data Format	Description
01_dataset_scholarly_references_on_YouTube.json.gz	JSON Lines	An integrated dataset of scholarly references in YouTube video descriptions, covering videos posted up to the end of December 2023. This dataset combines the Altmetric dataset and the YA Domain Dataset and is the basis for identifying references to retracted articles. This dataset contains 743,529 scholarly references (386,628 unique DOIs) found in 322,521 YouTube videos uploaded by 77,974 channels.
02_dataset_references_to_retracted_articles_on_YouTube.json.gz	JSON Lines	A dataset of retracted articles referenced in YouTube videos, used as the primary source for analysis in this paper. The dataset was created by cross-referencing the integrated reference dataset with the Retraction Watch database. It includes metadata such as DOI, article title, retraction reason, and severity classification (Severe, Moderate, or Minor) based on Woo and Walsh (2024), along with video- and channel-level statistics (e.g., view counts and subscriber counts) retrieved via the YouTube Data API v3 as of April 22, 2025. This dataset contains 1,002 retracted articles (360 unique DOIs) found in 956 YouTube videos uploaded by 714 channels.
03_full_list_table3_sorted_by_reference_count_retracted_articles_on_YouTube.json.gz	JSON Lines	Complete list corresponding to Table 3, "Top 7 retracted articles ranked by the number of YouTube videos in which they are referenced." in the paper.
04_full_list_table5_top10_most-viewed_video.json.gz	JSON Lines	Complete list corresponding to Table 5, "Top 10 most-viewed YouTube videos that reference retracted articles, sorted by video view count." in the paper.
05_detailed_manual_coding_40_sampled_retracted_articles.xlsx	XLSX	This file provides detailed annotations for a manually coded sample of 40 YouTube videos referencing retracted scholarly articles. The sample includes 10 randomly selected videos from each of the four analytical groups categorized by publication timing (before/after retraction) and retraction severity (Moderate/Severe). The file includes reference stance for each video, visual/verbal mention of the article, and relevant timestamps when applicable. This dataset supplements the manual analysis results presented in Tables 6 and 7 in paper.

Due to concerns over potential misuse (e.g., identification or harassment of individual content creators), this dataset is not made publicly available.
Researchers who wish to use this dataset for scholarly purposes may contact the authors to request access.

References

Woo, S., Walsh, J.P.: On the shoulders of fallen giants: What do references to retracted research tell us about citation behaviors? Quantitative Science Studies 5(1), 1–30 (2024). https://doi.org/10.1162/qss_a_00303
Kikkawa, J., Takaku, M.: How Retracted Article Persists on YouTube: Retraction Severity, Visibility, and Disclosure. Accepted for publication in the Proceedings of the 29th International Conference on Theory and Practice of Digital Libraries (TPDL 2025).
Accepted Papers (TPDL2025) - https://tpdl2025.github.io/Program/accepted_papers.html

Fundings

JSPS KAKENHI Grant Numbers JP22K18147 and JP23K11761.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You are currently not logged in. Do you have an account? Log in here

	All versions	This version
Views	17	17
Downloads	6	6
Data volume	247.2 kB	247.2 kB

Dataset and Supplementary Tables on Retracted Articles Referenced in YouTube Videos (TPDL 2025)

Creators

Description

Files

Restricted

Request access