AnuraSet: A dataset for benchmarking neotropical anuran calls identification in passive acoustic monitoring

Cañas, Juan Sebastián; Toro-Gómez, Maria Paula; Moreira Sugai, Larissa Sayuri; Benítez Restrepo, Hernán Darío; Rudas, Jorge; Posso Bautista, Breyner; Toledo, Luis Felipe; Dena, Simone; Rosa Domingos, Adão Henrique; De Souza, Franco Leandro; De Oliveira, Selvino Neckel; Da Rosa, Anderson; Carvalho-Rocha, Vítor; Bernardy, José Vinícius; Massao Moreira Sugai , José Luiz; dos Santos, Carolina Emília; Pereira Bastos, Rogério; Llusia, Diego; Ulloa, Juan Sebastián

doi:10.5281/zenodo.8342596

Published June 16, 2023 | Version v3

Peer review Open

AnuraSet: A dataset for benchmarking neotropical anuran calls identification in passive acoustic monitoring

1. Instituto de Investigación de Recursos Biológicos Alexander von Humboldt
2. K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University
3. Pontificia Universidad Javeriana
4. Universidade Estadual de Campinas
5. University of Campinas
6. Instituto de Pesquisa da Biodiversidade
7. Universidade Federal de Mato Grosso do Sul
8. Universidade Federal de Santa Catarina
9. Universidade Federal de Goiás
10. Terrestrial Ecology Group, Departamento de Ecología, Universidad Autónoma de Madrid

raw_data.zip

Raw audio data collected in the field. It is composed of sub-folders that represent each monitoring site. Each sub-folder is composed of audio .wav files that follow the name of {site}_{date}_{time}.wav.

weak_labels.csv

Annotation at a 1-minute level where each raw audio data it is assigned a value representing the anuran calling activity: 0 is absence; 1 is Low; 2 is Moderate; and 3 is High. The CSV file is composed of two columns representing the site and the file name and species columns with the anuran calling activity.

strong_labels.zip

Annotation at a high level with temporal limits (beginning and end) of audio segments containing species-specific calls with an inter-call interval of less than 1 second. As in raw_data, each sub-folder represents a monitoring site and the files are .txt containing (i) call beginning; (ii) call end; and (iii) the species name and audio quality.

anuraset.zip

Preprocessed dataset with 93378 3-second audio samples input for benchmarking. The dataset folder contains 2 files and one folder containing separate folders per site. The samples are WAV audio files with fixed 3-second lengths, obtained with 22.05 kHz sampling frequency and 16-bit depth. The two other files are a README file describing the structure and construction of the dataset and a metadata CSV file containing the labels for each sample.

Notes

Abstract: Global change is predicted to induce shifts in anuran acoustic behavior, which can be studied through passive acoustic monitoring (PAM). Understanding changes in calling behavior requires the identification of anuran species, which is challenging due to the particular characteristics of neotropical soundscapes. In this paper, we introduce a large-scale multi-species dataset of anuran amphibians calls recorded by PAM, that comprises 27 hours of expert annotations for 42 different species from two Brazilian biomes. We provide open access to the dataset, including the raw recordings, experimental setup code, and a benchmark with a baseline model of the fine-grained categorization problem. Additionally, we highlight the challenges of the dataset to encourage machine learning researchers to solve the problem of anuran call identification towards conservation policy. All our experiments and resources can be found on our GitHub repository https://github.com/soundclim/anuraset.

Files

anuraset.zip

Files (18.6 GB)

Name	Size	Download all
anuraset.zip md5:7950ac82e288113c102b2a7ffa05b9dc	11.4 GB	Preview Download
raw_data.zip md5:1d7c6f353a969615f3fb10b1a0488b9e	7.2 GB	Preview Download
species.csv md5:ff253e3b8948eae0e1329145394ede24	1.7 kB	Preview Download
strong_labels.zip md5:e91913077584e475469b84486ffe9f15	584.3 kB	Preview Download
weak_labels.csv md5:3806a76cc7e6d6b9186a5c7bef442723	186.8 kB	Preview Download

	All versions	This version
Views	1,853	1,109
Downloads	1,825	1,448
Data volume	18.3 TB	15.2 TB

AnuraSet: A dataset for benchmarking neotropical anuran calls identification in passive acoustic monitoring

Creators

Description

Notes

Files

anuraset.zip

Files (18.6 GB)