Dataset Open Access

EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation

Hung, Hsiao-Tzu; Ching, Joann; Doh, Seungheon; Kim, Nabin; Nam, Juhan; Yang, Yi-Hsuan

EMOPIA (pronounced ‘yee-mò-pi-uh’) dataset is a shared multi-modal (audio and MIDI) database focusing on perceived emotion in pop piano music, to facilitate research on various tasks related to music emotion. The dataset contains 1,087 music clips from 387 songs and clip-level emotion labels annotated by four dedicated annotators. 

For more detailed information about the dataset, please refer to our paper: EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation

File Description

  • midis/: midi clips transcribed using GiantMIDI.
    • Filename `Q1_xxxxxxx_2.mp3`: Q1 means this clip belongs to Q1 on the V-A space; xxxxxxx is the song ID on YouTube, and the `2` means this clip is the 2nd clip taken from the full song.
  • metadata/: metadata from YouTube. (Got when crawling)
  • songs_lists/: YouTube URLs of songs.

  • tagging_lists/: raw tagging result for each sample.

  • label.csv: metadata that records filename, 4Q label, and annotator.

  • metadata_by_song.csv: list all the clips by the song. Can be used to create the train/val/test splits to avoid the same song appear in both train and test.

  • scripts/prepare_split.ipynb: the script to create train/val/test splits and save them to csv files.


2.2 Update

  • Add tagging files in tagging_lists/ that are missing in the previous version.
  • Add timestamps.json for easier usage. It records all the timestamps in dict format. You can see scripts/load_timestamp.ipynb for the format example.
  • Add scripts/ After the raw audio are crawled and put in audios/raw, you can use this script to get audio clips. The script will read timestamps.json and use the timestamp to extract clips. The clips will be saved to audios/seg folder.
  • remove 7 midi files that were added by mistake, and also corrected the number in metadata_by_song.csv.


2.1 Update

Add one file and one folder:

  • key_mode_tempo.csv: key, mode, and tempo information extracted from files.
  • CP_events/:  CP events used in our paper. Extracted using this script, and add the emotion event to the front.

Modify one folder:

  • The REMI_events/ files in version 2.0 contain some information that is not related to the paper, so remove it.


2.0 Update

Add two new folders:

  • corpus/:  processed data that following the preprocessing flow. (Please notice that although we have 1078 clips in our dataset, we lost some clips during steps 1~4 of the flow, so the final number of clips in this corpus is 1052, and that's the number we used for training the generative model.)
  • REMI_events/: REMI event for each midi file. They are generated using this script.





Cite this dataset

         author = {Hung, Hsiao-Tzu and Ching, Joann and Doh, Seungheon and Kim, Nabin and Nam, Juhan and Yang, Yi-Hsuan},
         title = {{MOPIA}: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation},
         booktitle = {Proc. Int. Society for Music Information Retrieval Conf.},
         year = {2021}
Files (95.5 MB)
Name Size
5.5 MB Download
25.8 MB Download
32.1 MB Download
32.1 MB Download
All versions This version
Views 95391
Downloads 493120
Data volume 10.5 GB3.7 GB
Unique views 70176
Unique downloads 27669


Cite as