Voxaboxen Data
Creators
Description
Overview
Detecting the sounds produced by animals is the foundation of bioacoustics research. This task must often be performed using noisy recordings that include many overlapping sounds from multiple individuals. Identifying each individual acoustic unit is necessary for a diversity of tasks, including species recognition and population estimation, which are critical to research on topics such as ecology and conservation.
This dataset consists of eight real component datasets for evaluating bioacoustic sound event detection performance, as well as six synthetic component datasets. Each dataset consists of several audio recordings. Annotations consist of the start- and stop-times of each event of interest, as well a class label.
Component datasets
This dataset consists of eight real component datasets which are used to evaluate bioacoustic sound event detection performance. Seven of these datasets are derived from data that appeared in previous publications. For the license and original citation for each component dataset, please see the license file for each dataset. If you are using a component dataset, please cite our paper (see below) in addition to the original work. Dataset characteristics are summarized below:
Dataset | N. Files (train/val/test) | N. Classes | Dur. (hr) (train/val/test) | N. Events (train/val/test) | Mean event dur. (sec) | Location | Taxa |
Anuraset (AnSet) | 967/322/323 | 10 | 16.09/5.37/5.37 | 4279/1893/1635 | 6.23 | Brazil | Anura |
BirdVox-10h (BV10) | 5/5/5 | 1 | 6.00/2.00/2.00 | 4196/1064/3764 | 0.15 | New York, USA | Passeriformes |
Hawaii Birds (HawB) | 379/126/130 | 9 | 30.48/10.05/10.35 | 33372/11209/11132 | 1.11 | Hawaii, USA | Aves |
Humpback (HbW) | 388/125/129 | 1 | 8.08/2.60/2.69 | 2952/959/865 | 0.99 | North Pacific Ocean | Megaptera novaeangliae |
Katydids (Katy) | 16/5/6 | 1 | 2.66/0.83/1.00 | 7434/1550/2977 | 0.17 | Panama | Tettigoniidae |
Meerkat (MT) | 2/2/2 | 1 | 0.76/0.25/0.25 | 773/269/252 | 0.15 | South Africa | Suricata suricatta |
Powdermill (Pow) | 44/14/19 | 6 | 3.67/1.17/1.58 | 5138/2276/2505 | 1.11 | Pennsylvania, USA | Passeriformes |
Overlapping Zebra Finch (OZF) | 46/6/13 | 1 | 0.77/0.10/0.22 | 5514/1246/1744 | 0.11 | Laboratory | Taeniopygia castanotis |
This dataset also consists of six synthetic component datasets Overlapping Zebra Finch Synthetic (OZF Synthetic) x, where x can be any of
[0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
. The value of x is the ratio of the number of overlapping call pairs, to the number of calls. Each of these synthetic component datasets has the same characteristics. These were designed to mirror those of the real OZF dataset:
Dataset | N. Files (train/val/test) | N. Classes | Dur. (hr) (train/val/test) | N. Events (train/val/test) | Mean event dur. (sec) | Location | Taxa |
OZF Synthetic x | 65 | 1 | 1.08 | 5514/1246/1744 | 0.13 | Synthetic | Taeniopygia castanotis |
Dataset details
Each dataset consists of three info files: train_info.csv, val_info.csv, and test_info.csv, which define splits into train, validation, and test sets. Each info file has the following columns, where each row corresponds to one audio file:
- fn: Each audio file has a unique filename associated with it.
- audio_fp: Relative filepath to audio file.
- selection_table_filepath: Relative filepath to annotations, which are in the form of a Raven selection table
Selection tables are tab-separated `.txt` files. Each row corresponds to one annotated audio event (typically, a vocalization). Each selection table has (at least) the following columns:
- Begin Time (s): The time the event starts in the audio file
- End Time (s): The time the event ends in the audio file
- Annotation: A label (e.g. species) associated with the audio event
Some selection tables have no rows. This indicates that no events of interest occur in the corresponding recording.
Associated papers and code
If you use this data please cite the associated paper, which is currently available at https://arxiv.org/abs/2503.02389.
The code associated with the paper can be found at https://github.com/earthspecies/voxaboxen.
If you use any of the datasets besides OZF and OZF synthetic, please also cite the original work (found in the license file for each dataset).
Files
AnSet.zip
Files
(22.4 GB)
Name | Size | Download all |
---|---|---|
md5:1ee4e9f506ad13ac252a231d3c8dac3e
|
7.2 GB | Preview Download |
md5:214cc0e4743641a682ee25030ee8e5bd
|
1.2 GB | Preview Download |
md5:9b5cc24a6f377bfd17c777cf3bad8bcc
|
8.6 GB | Preview Download |
md5:2f185b86e25f80df0a634d9520cc850d
|
529.7 MB | Preview Download |
md5:643871706f49b70f58c70dae3f048414
|
2.4 GB | Preview Download |
md5:c82ee21ca7fe906e71abd2789ea462ad
|
24.4 MB | Preview Download |
md5:12f85aded4f0f94c3cabb1f7f1a5048f
|
352.2 MB | Preview Download |
md5:2043ea73143d08ab221c408e430fcabd
|
639.4 MB | Preview Download |
md5:1485f3860c7fad14ca542e149f7aebd8
|
1.4 GB | Preview Download |
md5:2f4d19cadc5928411b9064bbf75089b2
|
4.5 kB | Preview Download |
Additional details
Related works
- Is described by
- Preprint: arXiv:2503.02389 (arXiv)
Software
- Repository URL
- https://github.com/earthspecies/voxaboxen
- Programming language
- Python