There is a newer version of the record available.

Published March 26, 2025 | Version v1
Dataset Open

BioDCASE 2025 Task 2 : Development set

  • 1. ROR icon Sorbonne Université
  • 2. ROR icon École Nationale Supérieure de Techniques Avancées
  • 3. ROR icon IMT Atlantique
  • 4. ROR icon Flanders Marine Institute
  • 5. ROR icon Australian Antarctic Division
  • 6. EDMO icon France Marine Renewable Energies
  • 7. ROR icon Centre d'Etudes Biologiques de Chizé
  • 8. EDMO icon National Institute for Research in Computer and Control Sciences
  • 9. ROR icon University of Southampton
  • 10. ROR icon Curtin University
  • 11. EDMO icon University of Western Brittany
  • 12. ROR icon San Diego State University

Description

General description 

This repository contains all development data for the task 2 "Supervised Detection of Strongly-Labelled Whale Calls" of the BioDCASE 2025 challenge. It has been derived from the ATBFL library, to make a version more fitted to the challenge. ATBFL is one of the largest annotated datasets in marine bioacoustics, gathering underwater recordings around Antarctica from 2005 to 2017. Overall, the data are composed of :
  • 6591 audio files (6004 in train set + 587 in validation set) totaling 1880 hours of recordings from 11 different deployments organized in site-year datasets (eg, kerguelen-2015);
  • 11 CSV annotation files named after each corresponding site-year dataset. See "Annotation structure" for more details.

Folder structure 

biodcase_development_set.zip

|___biodcase_development_set/

      |____train/

            |____annotations/

                  |____site_year1.csv

                  |____site_year2.csv

                  |____...

            |____audio/

                  |____site-year1/

                        |____*.wav

                  |____site-year2/

                  |____...

      |____validation/

            |____annotations/

                  |____site_year1.csv

                  |____site_year2.csv

                  |____...

            |____audio/

                  |____site-year1/

                        |____*.wav

                  |____site-year2/

                  |____...

        

Annotation structure 

Each annotated sound event is defined by the tuple (dataset,filename,annotation,annotator,low_frequency,high_frequency,start_datetime,end_datetime), with annotation representing the class label and taking a unique value in {bma, bmb, bmz, bmd, bpd, bp20, bp20plus}. Note that these labels correspond to a more machine-readable version of the list {BmA, BmB, BmZ, BmD, BpD, Bp20, Bp20plus} described below in the section Call description. Annotator represents the short name of the expert annotator who have produced the annotation file. There is one single annotator per dataset but a same annotator may have annotated several datasets.

As calls may overlap, the set up is multi-class and multi-label : one file or segment of file file is likely to contain several classes. 

Please note that evaluation metric is IoU 1D over the temporal axis : the frequency component is provided but will not be part of the final evaluation. 

Please note that 7 classes are provided but the evaluation will only take 3 into account as calls can be gathered by similarity following this table : 

bma bmb bmz bmd bpd bp20 bp20plus
ABZ call ABZ call ABZ call Downsweep Downsweep Bp call Bp call
 

Calls description

BmA, BmB, and BmZ calls are specific to blue whales (Balaenoptera musculus intermedia, Bm), while Bp20 and Bp20Plus calls are characteristic of fin whales (Balaenoptera physalus quoyi, Bp). Both species also produce downsweeps.

As described by Miller et al. (see "Related Works"), BmA calls consist of a constant-frequency tone between 25 and 28 Hz, without additional units. BmB calls are similar but followed by a partial or full inter-tone downsweep. BmZ calls contain two tonal units: A (higher frequency) and C (lower frequency). Occasionally, a B downsweep unit appears between them, forming a "Z" shape on spectrograms.

Bp20 and Bp20Plus vocalizations are pulsed calls with peak energy at 20 Hz (Bp20) and additional energy at higher frequencies (80–100 Hz) in Bp20Plus.

Downsweeps are characterized by a continuous frequency modulation from f₁ to f₂, where f₁ > f₂. Example spectrograms are available on the here, with more detailed references in Miller et al.

Dataset statistics

Train set

Dataset Number of audio recordings Total audio duration  (h) Positive duration (h:m.s) Total sound event
ballenyisland2015 205 204 02:46.32 2222
casey2014 194 194 14:12.00 6866
elephantisland2013 2247 187 16:06.58 21966
elephantisland2014 2595 216 28:08.25 20964
greenwich2015 190 32 02:03.11 1128
kerguelen2005 200 200 03:31.37 2960
maudrise2014 200 83 05:42.45 2360
rosssea2014 176 176 08:51.00 104
TOTAL 6004 1292 72:40.20 58570

Validation set

Dataset Number of audio recordings Total audio duration  (hour) Positive duration (h:m.s) Total sound event
casey2017 187 187 06:06.42 3263
kerguelen2014 200 200 11:23.02 8822
kerguelen2015 200 200 07:23.30 5542
TOTAL 587 587 24:53.14 17627

A more complete version of this table is available here, with more statistics within the different classes and more information on the recording deployments.

Evaluation set

The evaluation set for this task will be released on June 1, 2025. 

Data collection and curation

Original data were collected, curated, annotated and published by Miller, B.S., The IWC-SORP/SOOS Acoustic Trends Working Group., Balcazar, N. et al. An open access dataset for developing automated detectors of Antarctic baleen whale sounds and performance evaluation of two commonly used detectors. Sci Rep 11, 806 (2021). https://doi.org/10.1038/s41598-020-78995-8 (see Related works section)

Minor changes were brought to the original data and annotation set to fit the challenge formatting : 

  • adopting more consistent naming conventions, particularly for file paths. All .wav filenames now follow the format "YYYY-mm-ddTHH-MM-SS_fff.wv", and the same convention is used in the CSV files under the "filename" column. Additionally, the start and end timestamps for temporal annotation boxes are formatted as "YYYY-mm-ddTHH:MM:SS.ffffff+zz:zz", all in Zulu Time,

  • pooling together call-type-specific Raven tables into one CSV file per subdataset, named after the site-year annotated,
  • resampling all audio files at 250 Hz, the minimal original sample rate of several subdatasets. This was done to standardize sample rates and because the vocalizations of interest are very low-frequency calls (15–120 Hz), ensuring no loss of information at  sampling rate of 250 Hz

Open access

Data used in this study are publicly available under a Creative Commons 4.0 Attribution licence. It is attributed to Miller, B.S., The IWC-SORP/SOOS Acoustic Trends Working Group., Balcazar, N. et al.They can be accessed via the Australian Antarctic Data Centre at https://data.aad.gov.au/metadata/records/AcousticTrends_BlueFinLibrary

Contact info 

Please, send any feedback or question to : 

Dorian CAZAU (dorian.cazau@ensta.fr) | Lucie JEAN-LABADYE (lucie.jean-labadye@sorbonne-universite.fr) | Cléa PARCERISAS (clea.parcerisas@vliz.be)

Files

biodcase_development_set.zip

Files (5.6 GB)

Name Size Download all
md5:f7899c93aff78a7fa12f7232cde5aa79
5.6 GB Preview Download

Additional details

Related works

Is described by
Publication: 10.1038/s41598-020-78995-8 (DOI)