PodcastMix - a dataset for separating music and speech in podcasts

Nicolas Schmidt; Jordi Pons; Marius Miron

doi:10.5281/zenodo.5597047

Published October 6, 2021 | Version 0.1

Dataset Open

PodcastMix - a dataset for separating music and speech in podcasts

1. Universitat Pompeu Fabra, Barcelona

Note: due to zenodo limitations here we host solely the metadata. the whole dataset can be found at: https://drive.google.com/drive/u/0/folders/1tpg9WXkl4L0zU84AwLQjrFqnP-jw1t7z

We introduce PodcastMix, a dataset formalizing the task of separating background music and foreground speech in podcasts. It contains audio files at 44.1kHz and the corresponding metadata. For further details check the following paper and the associated GitHub repository:

N. Schmidt, J. Pons, M. Miron, "PodcastMix - a dataset for separating music and speech in podcasts", Interspeech (2022)
N. Schmidt, "PodcastMix - a dataset for separating music and speech in podcasts", Masters thesis, MTG, UPF (2021) https://zenodo.org/record/5554790#.YXLHvNlByWA
https://github.com/MTG/Podcastmix

This dataset contains four parts. Due to zenodo file size limitation we host the training dataset on google drive. We highlight the content of the zenodo archives within brackets:

[metadata] PodcastMix-synth train: large and diverse training set that is programatically generated (with a validation partition). The mixtures are created programatically with music from Jamendo and speech from the VCTK dataset.
[metadata] PodcastMix-synth test a programatically generated test set with reference stems to compute evaluation metrics. The mixtures are created programatically with music from Jamendo and speech from the VCTK dataset.
[audio and metadata] PodcastMix-real with-reference : a test set with real podcasts with reference stems to compute evaluation metrics. The podcasts are recorded by one of the authors and the source of the music is the FMA dataset.
[audio and metadata] PodcastMix-real no-reference: a test set with real podcasts with only the podcasts mixes for subjective evaluation. The podcasts are compiled from the internet.

The training dataset, PodcastMix-synth may be found at our google drive repository: https://drive.google.com/drive/folders/1tpg9WXkl4L0zU84AwLQjrFqnP-jw1t7z?usp=sharing . The archive comprises 450GB of audio and metadata with the following structure:

[metadata and audio] PodcastMix-synth train: large and diverse training set that is programatically generated (with a validation partition). The mixtures are created programatically with music from Jamendo and speech from the VCTK dataset.
[metadata and audio] PodcastMix-synth test a programatically generated test set with reference stems to compute evaluation metrics. The mixtures are created programatically with music from Jamendo and speech from the VCTK dataset.

Make sure you maintain the folder structure of the original dataset when you uncompress these files.

This dataset is created by Nicolas Schmidt, Marius Miron, Music Technology Group - Universitat Pompeu Fabra (Barcelona) and Jordi Pons. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 Unported License (CC BY-SA 4.0).

Please acknowledge PodcastMix in Academic Research. When the present dataset is used for academic research, we would highly appreciate if authors quote the following publications:

N. Schmidt, J. Pons, M. Miron, "PodcastMix - a dataset for separating music and speech in podcasts", Interspeech (2022)
N. Schmidt, "PodcastMix - a dataset for separating music and speech in podcasts", Masters thesis, MTG, UPF (2021) https://zenodo.org/record/5554790#.YXLHvNlByWA

The dataset and its contents are made available on an “as is” basis and without warranties of any kind, including without limitation satisfactory quality and conformity, merchantability, fitness for a particular purpose, accuracy or completeness, or absence of errors. Subject to any liability that may not be excluded or limited by law, the UPF is not liable for, and expressly excludes, all liability for loss or damage however and whenever caused to anyone by any use of the dataset or any part of it.

PURPOSES. The data is processed for the general purpose of carrying out research development and innovation studies, works or projects. In particular, but without limitation, the data is processed for the purpose of communicating with Licensee regarding any administrative and legal / judicial purposes.

Files

podcastmix.zip

Files (99.8 MB)

Name	Size	Download all
podcastmix.zip md5:80087d986b1dda20efe75f6aab83d624	99.8 MB	Preview Download

	All versions	This version
Views	1,727	1,172
Downloads	220	162
Data volume	25.1 GB	18.2 GB

PodcastMix - a dataset for separating music and speech in podcasts

Authors/Creators

Description

Files

podcastmix.zip

Files (99.8 MB)