Song Interpretation Dataset
Creators
- 1. Queen Mary University of London
- 2. NYU Shanghai
Description
The Song Interpretation Dataset combines data from two sources: (1) music and metadata from the Music4All Dataset and (2) lyrics and user interpretations from SongMeanings.com. We design a music metadata-based matching algorithm that aligns matching items in the two datasets with each other. In the end, we successfully match 25.47% of the tracks in the Music4All Dataset.
The dataset contains audio excerpts from 27,834 songs (30 seconds each, recorded at 44.1 kHz), the corresponding music metadata, about 490,000 user interpretations of the lyric text, and the number of votes given for each of these user interpretations. The average length of the interpretations is 97 words. Music in the dataset covers various genres, of which the top 5 are: Rock (11,626), Pop (6,071), Metal (2,516), Electronic (2,213) and Folk (1,760).
For more details, please refer to our paper "Interpreting Song Lyrics with an Audio-Informed Pre-trained Language Model".
Files
Files
(591.2 MB)
Name | Size | Download all |
---|---|---|
md5:fcf3e62e3db59e28d733f647f360dab2
|
281.1 MB | Download |
md5:c839669199b56977b0cecfdbe0de3613
|
242.1 MB | Download |
md5:87e00f54564173e0ccf86f841b08bf86
|
68.0 MB | Download |