There is a newer version of the record available.

Published May 20, 2025 | Version v1
Dataset Open

DnR-nonverbal dataset

Description

Introduction

DnR-nonverbal is a dataset for cinematic audio source separation (CASS) based on Divide and Remaster (DnR) dataset.

Unlike conventional datasets, our dataset contains non-verbal sounds such as laughter and screaming, just like actual movie audio. Our dataset enables CASS models to allocate non-verbal sounds to the same stem as speech. Examples of clips and separation results are available at https://tky823.github.io/hasumi2025dnr.github.io/

How to Use

  1. Download dnr-nonverbal.tar.gz from this page.
  2. Extract dnr-nonverbal.tar.gz by
    tar xvzf dnr-nonverval.tar.gz
  3. (optional) Mix directories with the DnR. Our sample IDs are assigned in such a way that they do not duplicate DnR.

Dataset Structure

The dataset structure is based on DnR, except that our dataset contains non-verbal sounds as a part of the speech stem.

dnr-nonverbal
├── tr
│   ├── 100009
│   │   ├── annots.csv
│   │   ├── background.wav
│   │   ├── foreground.wav
│   │   ├── mix.wav
│   │   ├── music.wav
│   │   ├── nonverbal.wav
│   │   ├── reading.wav
│   │   ├── sfx.wav
│   │   └── speech.wav
│   ├── 100031
│   ...
├── cv
└── tt
  • reading.wav: Reading style speech extracted from LibriSpeech.
  • nonverbal.wav: Non-verbal sounds collected from FSD50K and newly crawled from FreeSound.
  • speech.wav: Mixture of reading style speech and non-verbal sounds.
  • music.wav: Background music extracted from FMA (medium).
  • foreground.wav: Foreground effect sounds collected from FSD50K.
  • background.wav: Background effect sounds collected from FSD50K.
  • sfx.wav: Foreground and background effect sounds.
  • annots.csv: A metadata file that identifies sources of sounds.

Citation

@inproceedings{hasumi25_interspeech,
  title= {{DnR-nonverbal: Cinematic audio source separation dataset containing non-verbal sounds}},
  author={Takuya Hasumi and Yusuke Fujita},
  year= {2025},
  booktitle = {Interspeech 2025},
  pages= {4993--4997},
  doi= {10.21437/Interspeech.2025-1148},
  issn={2958-1796},
}

 

 

Files

Files (22.7 GB)

Name Size Download all
md5:c3d80ce875d8d408439a20b65d6c4405
22.7 GB Download