Published May 24, 2025 | Version v1
Dataset Open

Voxaboxen Data

Description

Overview

Detecting the sounds produced by animals is the foundation of bioacoustics research. This task must often be performed using noisy recordings that include many overlapping sounds from multiple individuals. Identifying each individual acoustic unit is necessary for a diversity of tasks, including species recognition and population estimation, which are critical to research on topics such as ecology and conservation.

This dataset consists of eight real component datasets for evaluating bioacoustic sound event detection performance, as well as six synthetic component datasets. Each dataset consists of several audio recordings. Annotations consist of the start- and stop-times of each event of interest, as well a class label.

Component datasets

This dataset consists of eight real component datasets which are used to evaluate bioacoustic sound event detection performance. Seven of these datasets are derived from data that appeared in previous publications. For the license and original citation for each component dataset, please see the license file for each dataset. If you are using a component dataset, please cite our paper (see below) in addition to the original work. Dataset characteristics are summarized below:

Dataset N. Files (train/val/test) N. Classes Dur. (hr) (train/val/test) N. Events (train/val/test) Mean event dur. (sec) Location Taxa
Anuraset (AnSet) 967/322/323 10 16.09/5.37/5.37 4279/1893/1635 6.23 Brazil Anura
BirdVox-10h  (BV10) 5/5/5 1 6.00/2.00/2.00 4196/1064/3764 0.15 New York, USA Passeriformes
Hawaii Birds  (HawB) 379/126/130 9 30.48/10.05/10.35 33372/11209/11132 1.11 Hawaii, USA Aves
Humpback  (HbW) 388/125/129 1 8.08/2.60/2.69 2952/959/865 0.99 North Pacific Ocean Megaptera novaeangliae
Katydids  (Katy) 16/5/6 1 2.66/0.83/1.00 7434/1550/2977 0.17 Panama Tettigoniidae
Meerkat (MT) 2/2/2 1 0.76/0.25/0.25 773/269/252 0.15 South Africa Suricata suricatta
Powdermill  (Pow) 44/14/19 6 3.67/1.17/1.58 5138/2276/2505 1.11 Pennsylvania, USA Passeriformes
Overlapping  Zebra Finch  (OZF) 46/6/13 1 0.77/0.10/0.22 5514/1246/1744 0.11 Laboratory Taeniopygia castanotis

This dataset also consists of six synthetic component datasets Overlapping Zebra Finch Synthetic (OZF Synthetic) x, where x can be any of [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]. The value of x is the ratio of the number of overlapping call pairs, to the number of calls. Each of these synthetic component datasets has the same characteristics. These were designed to mirror those of the real OZF dataset:

Dataset N. Files (train/val/test) N. Classes Dur. (hr) (train/val/test) N. Events (train/val/test) Mean event dur. (sec) Location Taxa
OZF Synthetic x 65 1 1.08 5514/1246/1744 0.13 Synthetic Taeniopygia castanotis

Dataset details

Each dataset consists of three info files: train_info.csv, val_info.csv, and test_info.csv, which define splits into train, validation, and test sets. Each info file has the following columns, where each row corresponds to one audio file:

- fn: Each audio file has a unique filename associated with it.
- audio_fp: Relative filepath to audio file.
- selection_table_filepath: Relative filepath to annotations, which are in the form of a Raven selection table

Selection tables are tab-separated `.txt` files. Each row corresponds to one annotated audio event (typically, a vocalization). Each selection table has (at least) the following columns: 

- Begin Time (s): The time the event starts in the audio file
- End Time (s): The time the event ends in the audio file
- Annotation: A label (e.g. species) associated with the audio event

Some selection tables have no rows. This indicates that no events of interest occur in the corresponding recording.

Associated papers and code

If you use this data please cite the associated paper, which is currently available at https://arxiv.org/abs/2503.02389.

The code associated with the paper can be found at https://github.com/earthspecies/voxaboxen.

If you use any of the datasets besides OZF and OZF synthetic, please also cite the original work (found in the license file for each dataset).

Files

AnSet.zip

Files (22.4 GB)

Name Size Download all
md5:1ee4e9f506ad13ac252a231d3c8dac3e
7.2 GB Preview Download
md5:214cc0e4743641a682ee25030ee8e5bd
1.2 GB Preview Download
md5:9b5cc24a6f377bfd17c777cf3bad8bcc
8.6 GB Preview Download
md5:2f185b86e25f80df0a634d9520cc850d
529.7 MB Preview Download
md5:643871706f49b70f58c70dae3f048414
2.4 GB Preview Download
md5:c82ee21ca7fe906e71abd2789ea462ad
24.4 MB Preview Download
md5:12f85aded4f0f94c3cabb1f7f1a5048f
352.2 MB Preview Download
md5:2043ea73143d08ab221c408e430fcabd
639.4 MB Preview Download
md5:1485f3860c7fad14ca542e149f7aebd8
1.4 GB Preview Download
md5:2f4d19cadc5928411b9064bbf75089b2
4.5 kB Preview Download

Additional details

Related works

Is described by
Preprint: arXiv:2503.02389 (arXiv)

Software

Repository URL
https://github.com/earthspecies/voxaboxen
Programming language
Python