There is a newer version of the record available.

Published May 5, 2022 | Version 1.0
Dataset Open

Song Interpretation Dataset

  • 1. Queen Mary University of London
  • 2. NYU Shanghai

Description

The Song Interpretation Dataset combines data from two sources: (1) music and metadata from the Music4All Dataset and (2) lyrics and user interpretations from SongMeanings.com. We design a music metadata-based matching algorithm that aligns matching items in the two datasets with each other. In the end, we successfully match 25.47% of the tracks in the Music4All Dataset.

The dataset contains audio excerpts from 27,834 songs (30 seconds each, recorded at 44.1 kHz), the corresponding music metadata, about 490,000 user interpretations of the lyric text, and the number of votes given for each of these user interpretations. The average length of the interpretations is 97 words. Music in the dataset covers various genres, of which the top 5 are: Rock (11,626), Pop (6,071), Metal (2,516), Electronic (2,213) and Folk (1,760). 

For more details, please refer to our paper "Interpreting Song Lyrics with an Audio-Informed Pre-trained Language Model".

Files

Files (591.2 MB)

Name Size Download all
md5:fcf3e62e3db59e28d733f647f360dab2
281.1 MB Download
md5:c839669199b56977b0cecfdbe0de3613
242.1 MB Download
md5:87e00f54564173e0ccf86f841b08bf86
68.0 MB Download