Published March 26, 2026 | Version v3
Dataset Open

PteroSet

Description

This release provides a curated passive acoustic monitoring dataset of Neotropical bird vocalizations recorded in Colombia (Puerto Asís, Putumayo; Pijavay, Magdalena) between 2023 and 2025.

Audio was collected using autonomous AudioMoth recorders and stored in WAV format, distributed in audios.zip. The collection comprises 563 recordings totaling 73.62 hours at a 192 kHz sample rate. The dataset contains 15,372 curated bird-event annotations at the taxonomic-group level, of which 6,702 include species-level determinations using standardized species codes in the species.csv lookup table.

Each recording is paired with expert-generated annotations created in Raven Pro (labels.zip) and subsequently harmonized into a COCO-inspired JSON schema for bioacoustics. These annotations are provided in two versions: annotations_identification.json, containing all annotations, and annotations_species.json, containing only species-level annotations. Annotations are provided as strong labels with explicit temporal boundaries (t_min, t_max) and spectral bounds (f_min, f_max), enabling time-frequency localization and event detection tasks.

Deployment details are distributed in metadata.csv. Additionally, model weights are included in checkpoints.zip, enabling benchmarking of bioacoustic models on this dataset. All code used for data processing and technical validation is available at  microsoft/PteroSet to ensure reproducibility.

Files

annotations_identification.json

Files (86.4 GB)

Name Size Download all
md5:7e7ed085752da3ae6fd126c608b3b966
4.9 MB Preview Download
md5:fefae0bd5b9cad4ead154ebf4dfeaf37
2.3 MB Preview Download
md5:1338d78735f6a4ce95f727c8b85c7803
85.8 GB Preview Download
md5:e66f6ad94a1bb2f992fd2753c4de6072
619.3 MB Preview Download
md5:91457497ae9692708c51171de5a8237d
622.6 kB Preview Download
md5:5901bef3d4e40ff8a16a47a0d39c6b6b
105.9 kB Preview Download
md5:ab0dd4b6ecfd526c749b86bd6a5e71cc
6.6 kB Preview Download