A collection of fully-annotated soundscape recordings from the southern Sierra Nevada mountain range
- 1. Institute for Bird Populations
- 2. K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University
- 3. Sequoia and Kings Canyon National Parks
- 4. Natural Sounds and Night Skies Division, National Park Service
- 5. Department of Evolution and Ecology, University of California, Davis
Description
This collection contains 100 soundscape recordings of 10 minutes duration, which have been annotated with 10,296 bounding box labels for 21 different bird species from the Western United States. The data were recorded in 2015 in the southern end of the Sierra Nevada mountain range in California, USA. This collection has been featured as test data in the 2020 BirdCLEF and Kaggle Birdcall Identification competition and can primarily be used for training and evaluation of machine learning algorithms.
Data collection
The recordings were made in Sequoia and Kings Canyon National Parks, two contiguous national parks in the southern Sierra Nevada mountain range in California, USA. The focus of the acoustic study was the high-elevation region of the Parks; specifically, the headwater lake basins above 3,000 km in elevation. The original intent of the study was to monitor seasonal activity of birds and bats at lakes containing trout and lakes without trout, because the cascading impacts of trout on the adjacent terrestrial zone remain poorly understood. Soundscapes were recorded for 24 h continuously at 10 lakes (5 fishless, 5 fish-containing) throughout Sequoia and Kings Canyon National Parks during June-September 2015. Song Meter SM2+ units (Wildlife Acoustics, USA) powered by custom-made solar panels were used to obviate the need to swap batteries, due to the recording locations being extremely difficult to access. Song Meters continuously recorded mono-channel, 16-bits uncompressed WAVE files at 48 kHz sampling rate. For this collection, recordings were resampled at 32 kHz and converted to FLAC.
Sampling and annotation protocol
A total of 100 10-minute segments of audio between July 9 and 12, 2015 from morning hours (06:10-09:10 PDT) from all 10 sites were selected at random. Annotators were asked to box every bird call they could recognize, ignoring those that are too faint or unidentifiable. Every sound that could not be confidently assigned an identity was reviewed with 1-2 other experts in bird identification. To minimize observer bias, all identifying information about the location, date and time of the recordings was hidden from the annotator. Raven Pro software was used to annotate the data. Provided labels contain full bird calls that are boxed in time and frequency. In this collection, we use eBird species codes as labels, following the 2021 eBird taxonomy (Clements list). Unidentifiable calls have been marked with “????” and were added as bounding box labels to the ground truth annotations. Parts of this dataset have previously been used in the 2020 BirdCLEF and Kaggle Birdcall Identification competition.
Files in this collection
Audio recordings can be accessed by downloading and extracting the “soundscape_data.zip” file. Soundscape recording filenames contain a sequential file ID, recording date and timestamp in PDT (UTC-7). As an example, the file “HSN_001_20150708_061805.flac” has sequential ID 001 and was recorded on July 8th 2015 at 06:18:05 PDT. Ground truth annotations are listed in “annotations.csv” where each line specifies the corresponding filename, start and end time in seconds, low and high frequency in Hertz and an eBird species code. These species codes can be assigned to scientific and common name of a species with the “species.csv” file. The approximate recording location with longitude and latitude can be found in the “recording_location.txt” file.
Acknowledgements
Compiling this extensive dataset was a major undertaking, and we are very thankful to the domain experts who helped to collect and manually annotate the data for this collection (individual contributors in alphabetic order): Anna Calderón, Thomas Hahn, Ruoshi Huang, Angelly Tovar
Files
annotations.csv
Files
(1.4 GB)
Name | Size | Download all |
---|---|---|
md5:0f300f5536784cc34f51f9f42e80756e
|
631.2 kB | Preview Download |
md5:4bb248f747fe43a4578052fc7853deaf
|
150.9 kB | Preview Download |
md5:bf5aae95885ee85fcdcc0a408e45bfe9
|
105 Bytes | Preview Download |
md5:b7796e5e28d1f6b1a8bb16f0ba294c9d
|
1.4 GB | Preview Download |
md5:7b534749a91070a9180e7d32be7f3ea9
|
969 Bytes | Preview Download |