Published May 14, 2025 | Version v1
Dataset Open

The Xeno-Canto Dawn Chorus (XCDC) Dataset

  • 1. EDMO icon University of Massachusetts, Amherst

Description

We construct a dataset of species-rich audio by sampling dawn chorus recordings from Xeno-Canto, a global archive of natural sound recordings. The dawn chorus is a period of intense, multi-species vocal activity that occurs near sunrise, particularly during the breeding season. To isolate these species-rich recordings, we select audio that (1) was recorded during the spring dawn chorus, (2) is at least 3 minutes in duration, and (3) contains annotations for at least 10 distinct species. This filtering yields 576 recordings with associated geographic coordinates and species labels. We refer to this dataset as the Xeno-Canto Dawn Chorus (XCDC), a new benchmark for evaluating geolocation from species-rich soundscapes.

For a detailed documentation of the dataset, please refer to the following repository: https://github.com/cvl-umass/nat-sound2loc-benchmark. For the benchmarking codebase, please refer to the following repository: https://github.com/cvl-umass/nat-sound2loc-code  

Files

xcdc_recordings.csv

Files (23.6 GB)

Name Size Download all
md5:1003d85155b0a9c5d378af82d47289b0
23.6 GB Download
md5:e344b098676d31d855ef73b06ce334e2
403.1 kB Preview Download

Additional details

Related works

Is source of
data (Other)

Dates

Created
2025-05-14

Software

Repository URL
https://github.com/cvl-umass/nat-sound2loc-benchmark
Programming language
Python
Development Status
Active