DnR-nonverbal dataset

Takuya, Hasumi; Yusuke, Fujita

doi:10.5281/zenodo.15470640

Published May 20, 2025 | Version v1

Dataset Open

DnR-nonverbal dataset

Introduction

DnR-nonverbal is a dataset for cinematic audio source separation (CASS) based on Divide and Remaster (DnR) dataset.

Unlike conventional datasets, our dataset contains non-verbal sounds such as laughter and screaming, just like actual movie audio. Our dataset enables CASS models to allocate non-verbal sounds to the same stem as speech. Examples of clips and separation results are available at https://tky823.github.io/hasumi2025dnr.github.io/

How to Use

Download dnr-nonverbal.tar.gz from this page.
Extract dnr-nonverbal.tar.gz by
```
tar xvzf dnr-nonverval.tar.gz
```
(optional) Mix directories with the DnR. Our sample IDs are assigned in such a way that they do not duplicate DnR.

Dataset Structure

The dataset structure is based on DnR, except that our dataset contains non-verbal sounds as a part of the speech stem.

dnr-nonverbal
├── tr
│   ├── 100009
│   │   ├── annots.csv
│   │   ├── background.wav
│   │   ├── foreground.wav
│   │   ├── mix.wav
│   │   ├── music.wav
│   │   ├── nonverbal.wav
│   │   ├── reading.wav
│   │   ├── sfx.wav
│   │   └── speech.wav
│   ├── 100031
│   ...
├── cv
└── tt

reading.wav: Reading style speech extracted from LibriSpeech.
nonverbal.wav: Non-verbal sounds collected from FSD50K and newly crawled from FreeSound.
speech.wav: Mixture of reading style speech and non-verbal sounds.
music.wav: Background music extracted from FMA (medium).
foreground.wav: Foreground effect sounds collected from FSD50K.
background.wav: Background effect sounds collected from FSD50K.
sfx.wav: Foreground and background effect sounds.
annots.csv: A metadata file that identifies sources of sounds.

Citation

@inproceedings{hasumi25_interspeech,
title= {{DnR-nonverbal: Cinematic audio source separation dataset containing non-verbal sounds}},
author={Takuya Hasumi and Yusuke Fujita},
year= {2025},
booktitle = {Interspeech 2025},
pages= {4993--4997},
doi= {10.21437/Interspeech.2025-1148},
issn={2958-1796},
}

Files

Files (22.7 GB)

Name	Size	Download all
dnr-nonverbal.tar.gz md5:c3d80ce875d8d408439a20b65d6c4405	22.7 GB	Download

	All versions	This version
Views	584	519
Downloads	251	196
Data volume	9.1 TB	6.6 TB

DnR-nonverbal dataset

Authors/Creators

Description

Introduction

How to Use

Dataset Structure

Citation

Files

Files (22.7 GB)