A collection of annotated soundscape recordings from western Kenya
Creators
Description
This collection contains 35 soundscape recordings of 32 hours total duration, which have been annotated with 10,294 labels for 176 different bird species from western Kenya. The data were recorded in 2021 and 2022 west and southwest of Lake Baringo in Baringo County, Kenya. This collection has partially been featured as test data in the 2023 BirdCLEF competition and can primarily be used for training and evaluation of machine learning algorithms.
Data collection
For this collection, AudioMoths and SWIFT recording units were deployed at multiple locations west and southwest of Lake Baringo, Baringo County, Kenya between Dezember 2021 and February 2022. Recording locations cover a variety of habitats from open grasslands to semi-arid scrubland and mountain forests. Recordings were originally sampled at 48 kHz and converted to MP3 for faster file transfer. For publication, all files were resampled to 32 kHz and converted to FLAC.
Sampling and annotation protocol
A total of 32 hours of audio from various sites west and southwest of Lake Baringo were selected for annotation. Annotators were tasked with identifying and labeling each bird call they could discern, excluding any calls that were too weak or indiscernible. The annotation process was carried out using Audacity. Provided labels mark the center of each bird call. In this collection, we use eBird species codes as labels, following the 2021 eBird taxonomy (Clements list). Parts of this dataset have previously been used in the 2023 BirdCLEF competition.
Files in this collection
Audio recordings can be accessed by downloading and extracting the “soundscape_data.zip” file. Soundscape recording filenames contain a sequential file ID, recording date and timestamp in EAT (UTC+3). As an example, the file “KEN_001_20211207_153852.flac” has sequential ID 001 and was recorded on December 7th 2021 at 15:38:52 EAT. Ground truth annotations are listed in “annotations.csv” where each line specifies the corresponding filename, start and end time in seconds, and an eBird species code. These species codes can be assigned to scientific and common name of a species with the “species.csv” file. The approximate recording location with longitude and latitude can be found in the “recording_location.txt” file.
Acknowledgements
Compiling this extensive dataset was a major undertaking, and we are very thankful to the domain experts who helped to collect and manually annotate the data for this collection. In particular, our thanks go to Francis Cherutich for setting up recording units, collecting and annotating data, and to Alain Jacot for assisting in programming the units and transporting the recorders to Kenya.
Files
annotations.csv
Files
(3.3 GB)
Name | Size | Download all |
---|---|---|
md5:798646b51e2527f60613b7c0b37a693d
|
517.2 kB | Preview Download |
md5:cc09fc8409e3259f56b1071d829bbd7b
|
113.1 kB | Preview Download |
md5:1c58afe7467b20f4b6aa3a7f00ae53fe
|
142 Bytes | Preview Download |
md5:eef48cc193bf8a73415582a33d349c11
|
3.3 GB | Preview Download |
md5:4deae844ac992470b8336e6f91340b99
|
8.4 kB | Preview Download |