YouTube8M-MusicTextClips

McKee, Daniel; Salamon, Justin; Sivic, Josef; Russell, Bryan

doi:10.5281/zenodo.8040754

Published June 14, 2023 | Version 1.0.0

Dataset Open

YouTube8M-MusicTextClips

1. University of Illinois at Urbana-Champaign
2. Adobe Research
3. Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University

YouTube8M-MusicTextClips Dataset

This page includes the YouTube8M-MusicTextClips dataset from our CVPR 2023 paper:

Language-Guided Music Recommendation for Video via Prompt Analogies
Daniel McKee¹, Justin Salamon², Josef Sivic^2,3, Bryan Russell²
¹University of Illinois at Urbana-Champaign, ²Adobe Research, ³Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University

The dataset is licensed under a Research-only, non-commercial Adobe Research License. Please see our attached LICENSE file for more information.

Dataset Description

The YouTube8M-MusicTextClips dataset consists of over 4k high-quality human text descriptions of music found in video clips from the YouTube8M dataset.

For each selected YouTube music video, we extracted 10 second clips at the middle of the video for annotation. We provided annotators with only the audio corresponding to this clip. Thus, text annotations describe audio alone, not the visual content of the clip.

The dataset annotations are divided into train and test split files. As the dataset is meant mainly for evaluation, there are 3169 annotated clips from the test set and only 1000 annotated clips from the train set.

Each file contains the following information for each sample:

video_id: The YouTube ID corresponding to the video containing an annotated clip
start: Start time (in seconds) of the annotated clip in the video
end: End time (in seconds) of the annotated clip in the video
text: The text annotation describing the music from the annotated clip

For more information, please check our project page and paper: https://www.danielbmckee.com/language-guided-music-for-video/

Citation

If you use this dataset, please cite our paper:

McKee, D., Salamon, J., Sivic, J., & Russell, B. (2023). Language-Guided Music Recommendation for Video via Prompt Analogies. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023).

Bibtex:

@InProceedings{mckee2023language, author = {McKee, Daniel and Salamon, Justin and Sivic, Josef and Russell, Bryan}, title = {Language-Guided Music Recommendation for Video via Prompt Analogies}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2023}, }

Files

LICENSE.txt

Files (525.3 kB)

Name	Size	Download all
LICENSE.txt md5:c5f456273306498980357ea5242fbb40	2.3 kB	Preview Download
README.md md5:e806bd4fc3f651ee9265ea6269f8b57f	2.6 kB	Preview Download
test.csv md5:9794a57022478cef6973ca34e03c259f	395.9 kB	Preview Download
train.csv md5:07fe5411c7e9e53737e1401eb252e312	124.4 kB	Preview Download

	All versions	This version
Views	3,695	3,664
Downloads	1,903	1,895
Data volume	336.4 MB	333.9 MB

YouTube8M-MusicTextClips

Authors/Creators

Description

Files

LICENSE.txt

Files (525.3 kB)