Song Interpretation Dataset

Yixiao Zhang; Junyan Jiang; Gus Xia; Simon Dixon

doi:10.5281/zenodo.6519264

Published May 5, 2022 | Version 1.0

Dataset Open

Song Interpretation Dataset

1. Queen Mary University of London
2. NYU Shanghai

The Song Interpretation Dataset combines data from two sources: (1) music and metadata from the Music4All Dataset and (2) lyrics and user interpretations from SongMeanings.com. We design a music metadata-based matching algorithm that aligns matching items in the two datasets with each other. In the end, we successfully match 25.47% of the tracks in the Music4All Dataset.

The dataset contains audio excerpts from 27,834 songs (30 seconds each, recorded at 44.1 kHz), the corresponding music metadata, about 490,000 user interpretations of the lyric text, and the number of votes given for each of these user interpretations. The average length of the interpretations is 97 words. Music in the dataset covers various genres, of which the top 5 are: Rock (11,626), Pop (6,071), Metal (2,516), Electronic (2,213) and Folk (1,760).

For more details, please refer to our paper "Interpreting Song Lyrics with an Audio-Informed Pre-trained Language Model".

Files

Files (591.2 MB)

Name	Size	Download all
dataset_full_256.pkl md5:fcf3e62e3db59e28d733f647f360dab2	281.1 MB	Download
dataset_not_negative_256.pkl md5:c839669199b56977b0cecfdbe0de3613	242.1 MB	Download
dataset_positive_256.pkl md5:87e00f54564173e0ccf86f841b08bf86	68.0 MB	Download

Citations

Oops! Something went wrong while fetching results.

	All versions	This version
Views	1,741	772
Downloads	740	97
Data volume	253.6 GB	23.7 GB

Song Interpretation Dataset

Creators

Description

Files

Files (591.2 MB)