DnR-nonverbal dataset
Creators
Description
Introduction
DnR-nonverbal is a dataset for cinematic audio source separation (CASS) based on Divide and Remaster (DnR) dataset.
Unlike conventional datasets, our dataset contains non-verbal sounds such as laughter and screaming, just like actual movie audio. Our dataset enables CASS models to allocate non-verbal sounds to the same stem as speech. Examples of clips and separation results are available at https://tky823.github.io/hasumi2025dnr.github.io/
How to Use
- Download dnr-nonverbal.tar.gz from this page.
- Extract dnr-nonverbal.tar.gz by
tar xvzf dnr-nonverval.tar.gz
- (optional) Mix directories with the DnR. Our sample IDs are assigned in such a way that they do not duplicate DnR.
Dataset Structure
The dataset structure is based on DnR, except that our dataset contains non-verbal sounds as a part of the speech stem.
dnr-nonverbal
├── tr
│ ├── 100009
│ │ ├── annots.csv
│ │ ├── background.wav
│ │ ├── foreground.wav
│ │ ├── mix.wav
│ │ ├── music.wav
│ │ ├── nonverbal.wav
│ │ ├── reading.wav
│ │ ├── sfx.wav
│ │ └── speech.wav
│ ├── 100031
│ ...
├── cv
└── tt
- reading.wav: Reading style speech extracted from LibriSpeech.
- nonverbal.wav: Non-verbal sounds collected from FSD50K and newly crawled from FreeSound.
- speech.wav: Mixture of reading style speech and non-verbal sounds.
- music.wav: Background music extracted from FMA (medium).
- foreground.wav: Foreground effect sounds collected from FSD50K.
- background.wav: Background effect sounds collected from FSD50K.
- sfx.wav: Foreground and background effect sounds.
- annots.csv: A metadata file that identifies sources of sounds.
Citation
@inproceedings{hasumi25_interspeech,
title= {{DnR-nonverbal: Cinematic audio source separation dataset containing non-verbal sounds}},
author={Takuya Hasumi and Yusuke Fujita},
year= {2025},
booktitle = {Interspeech 2025},
pages= {4993--4997},
doi= {10.21437/Interspeech.2025-1148},
issn={2958-1796},
}
Files
Files
(22.7 GB)
Name | Size | Download all |
---|---|---|
md5:c3d80ce875d8d408439a20b65d6c4405
|
22.7 GB | Download |