Published June 16, 2023 | Version v3
Peer review Open

AnuraSet: A dataset for benchmarking neotropical anuran calls identification in passive acoustic monitoring

  • 1. Instituto de Investigación de Recursos Biológicos Alexander von Humboldt
  • 2. K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University
  • 3. Pontificia Universidad Javeriana
  • 4. Universidade Estadual de Campinas
  • 5. University of Campinas
  • 6. Instituto de Pesquisa da Biodiversidade
  • 7. Universidade Federal de Mato Grosso do Sul
  • 8. Universidade Federal de Santa Catarina
  • 9. Universidade Federal de Goiás
  • 10. Terrestrial Ecology Group, Departamento de Ecología, Universidad Autónoma de Madrid

Description

raw_data.zip

Raw audio data collected in the field. It is composed of sub-folders that represent each monitoring site. Each sub-folder is composed of audio .wav files that follow the name of {site}_{date}_{time}.wav.

weak_labels.csv

Annotation at a 1-minute level where each raw audio data it is assigned a value representing the anuran calling activity: 0 is absence; 1 is Low; 2 is Moderate; and 3 is High. The CSV file is composed of two columns representing the site and the file name and species columns with the anuran calling activity. 

strong_labels.zip

Annotation at a high level with temporal limits (beginning and end) of audio segments containing species-specific calls with an inter-call interval of less than 1 second. As in raw_data, each sub-folder represents a monitoring site and the files are .txt containing (i) call beginning; (ii) call end; and (iii) the species name and audio quality.

anuraset.zip

Preprocessed dataset with 93378 3-second audio samples input for benchmarking. The dataset folder contains 2 files and one folder containing separate folders per site. The samples are WAV audio files with fixed 3-second lengths, obtained with 22.05 kHz sampling frequency and 16-bit depth. The two other files are a README file describing the structure and construction of the dataset and a metadata CSV file containing the labels for each sample.

Notes

Abstract: Global change is predicted to induce shifts in anuran acoustic behavior,  which can be studied through passive acoustic monitoring (PAM). Understanding changes in calling behavior requires the identification of anuran species, which is challenging due to the particular characteristics of neotropical soundscapes. In this paper, we introduce a large-scale multi-species dataset of anuran amphibians calls recorded by PAM, that comprises 27 hours of expert annotations for 42 different species from two Brazilian biomes. We provide open access to the dataset, including the raw recordings, experimental setup code, and a benchmark with a baseline model of the fine-grained categorization problem. Additionally, we highlight the challenges of the dataset to encourage machine learning researchers to solve the problem of anuran call identification towards conservation policy. All our experiments and resources can be found on our GitHub repository https://github.com/soundclim/anuraset.

Files

anuraset.zip

Files (18.6 GB)

Name Size Download all
md5:7950ac82e288113c102b2a7ffa05b9dc
11.4 GB Preview Download
md5:1d7c6f353a969615f3fb10b1a0488b9e
7.2 GB Preview Download
md5:ff253e3b8948eae0e1329145394ede24
1.7 kB Preview Download
md5:e91913077584e475469b84486ffe9f15
584.3 kB Preview Download
md5:3806a76cc7e6d6b9186a5c7bef442723
186.8 kB Preview Download