Published September 9, 2024 | Version 1.0
Dataset Open

Biodenoising validation

  • 1. Earth Species Project

Description

Biodenoising_validation is a benchmark dataset for animal vocalization denoising. It contains 62 pairs of clean animal vocalizations and noise excerpts. 

We list the data sources in the clean.csv and noise.csv files.

The dataset is created at two sample rates: 16000 and 44100. Each subfolder contains the clean, noise, and noisy subfolders with the accompanying metadata related to the data sources.

Methodology
We programatically create mixtures by pairing vocalizations of noise at random Signal-to-Noise Ratios (SNR) from an uniform distribution between -5 and 10 dB (2.8 average SNR). To ensure reproducibility, we start with a fixed seed that controls the SNR of the mixtures. The samples are between 1 to 60 seconds long (20.14 seconds on average). We split the vocalizations and noises into two lists: underwater (11 vocalizations and 26 noises) and terrestrial (51 vocalizations and 20 noises). For each separate case, we sort the vocalizations and the noise samples and pair them in the order of their duration e.g. matching the longest calls with longest noises. 

Citation
Miron, Marius, Sara Keen, Jen-Yu Liu, Benjamin Hoffman, Masato Hagiwara, Olivier Pietquin, Felix Effenberger, Maddie Cusimano, "Biodenoising: animal vocalization denoising without access to clean data," 

License
This dataset is provided for educational purposes only and the material contained in them should not be used for any commercial purpose without the express permission of the copyright holders.

Contact   
info@mariusmiron.com

Files

biodenoising_validation_1.0.zip

Files (706.2 MB)

Name Size Download all
md5:05648b477b73f0e71cf98441a98630ef
706.2 MB Preview Download

Additional details

Additional titles

Alternative title
Biodenoising: animal vocalization denoising without access to clean data

Software

Repository URL
https://github.com/earthspecies/biodenoising-datasets
Programming language
Python

References

  • Miron, Marius, Sara Keen, Jen-Yu Liu, Benjamin Hoffman, Masato Hagiwara, Olivier Pietquin, Felix Effenberger, Maddie Cusimano, "Biodenoising: animal vocalization denoising without access to clean data,"