Published October 10, 2022 | Version 1.1
Dataset Open

N20EM dataset for multimodal lyric transcription

  • 1. National University of Singapore

Description

N20EM dataset for multimodal lyric transcription, proposed in our ACM MM 2022 paper, MM-ALT: A Multimodal Automatic Lyric Transcription System. This dataset contains recordings of three modalities: audio, video, and IMU motion signal. 

Our paper's camera ready version: https://arxiv.org/abs/2207.06127

Project website: https://n20em.github.io/

Note: 

  1. Once you download the dataset, we assume you have read and agreed with the Terms and Conditions.
  2. Commercial usage is strictly prohibited.

Please cite our work as:

@inproceedings{gu2022mm,  title={MM-ALT: A multimodal automatic lyric transcription system},  author={Gu, Xiangming and Ou, Longshen and Ong, Danielle and Wang, Ye},  booktitle={Proceedings of the 30th ACM International Conference on Multimedia},  pages={3328--3337},  year={2022} }

 

Notes

Add data for VAD training and accompaniment of each utterance.

Files

accompaniment.zip

Files (5.5 GB)

Name Size Download all
md5:79640d19a13311ed853907d45b7b687d
1.4 GB Preview Download
md5:9cd08c1eb28017a28da17b3985f02f10
1.1 GB Download
md5:6fb79f5ec7f8a2efd040c62a0419f1ce
3.0 GB Preview Download
md5:2594828ce15753cd3a508fe38029646e
245 Bytes Preview Download

Additional details

Related works

Is published in
Conference paper: 10.1145/3503161.3548411 (DOI)

References

  • Gu, X., Ou, L., Ong, D. and Wang, Y., 2022, October. Mm-alt: A multimodal automatic lyric transcription system. In Proceedings of the 30th ACM International Conference on Multimedia (pp. 3328-3337).