The Xeno-Canto Dawn Chorus (XCDC) Dataset
Description
We construct a dataset of species-rich audio by sampling dawn chorus recordings from Xeno-Canto, a global archive of natural sound recordings. The dawn chorus is a period of intense, multi-species vocal activity that occurs near sunrise, particularly during the breeding season. To isolate these species-rich recordings, we select audio that (1) was recorded during the spring dawn chorus, (2) is at least 3 minutes in duration, and (3) contains annotations for at least 10 distinct species. This filtering yields 576 recordings with associated geographic coordinates and species labels. We refer to this dataset as the Xeno-Canto Dawn Chorus (XCDC), a new benchmark for evaluating geolocation from species-rich soundscapes.
For a detailed documentation of the dataset, please refer to the following repository: https://github.com/cvl-umass/nat-sound2loc-benchmark. For the benchmarking codebase, please refer to the following repository: https://github.com/cvl-umass/nat-sound2loc-code
Files
xcdc_recordings.csv
Files
(23.6 GB)
Name | Size | Download all |
---|---|---|
md5:1003d85155b0a9c5d378af82d47289b0
|
23.6 GB | Download |
md5:e344b098676d31d855ef73b06ce334e2
|
403.1 kB | Preview Download |
Additional details
Related works
- Is source of
- data (Other)
Dates
- Created
-
2025-05-14
Software
- Repository URL
- https://github.com/cvl-umass/nat-sound2loc-benchmark
- Programming language
- Python
- Development Status
- Active