Published September 14, 2022 | Version 1
Dataset Open

A collection of fully-annotated soundscape recordings from the Island of Hawai'i

  • 1. Listening Observatory for Hawaiian Ecosystems, University of Hawai'i at Hilo
  • 2. K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University

Description

This collection contains 635 soundscape recordings with a total duration of almost 51 hours, which have been annotated by expert ornithologists who provided 59,583 bounding box labels for 27 different bird species from the Hawaiian Islands, including 6 threatened or endangered native birds. The data were recorded between 2016 and 2022 at four sites across Hawai‘i Island. This collection has partially been featured as test data in the 2022 BirdCLEF competition and can primarily be used for training and evaluation of machine learning algorithms.

Data collection

Soundscapes for this collection were recorded for various research projects by the Listening Observatory for Hawaiian Ecosystems (LOHE) at the University of Hawai‘i at Hilo. The recordings were collected using Wildlife Acoustics Inc. Song Meters (models 2, 4, or Mini), as 16-bit wav files at a sampling rate of 44.1 kHz, using the default gain settings of each model. Further specifics for each recording, such as recording location and habitat type, can be found in the metadata provided. Soundscapes in this collection vary in length, ranging from just under a minute to 9 minutes in duration. All audio was unified, converted to FLAC, and resampled to 32 kHz for this collection. Parts of this dataset have previously been used in the 2022 BirdCLEF competition.

Sampling and annotation protocol

This collection is a subset of the files recorded over the course of the LOHE lab’s respective studies. The data were subsampled for annotation by aurally scanning the recordings and visually scanning spectrograms generated using Raven Pro software for target species of interest to the individual research project for which each recording was collected. Recordings that did not contain vocalizations of the species of interest were excluded from full annotation and thus this collection. 

Using Raven Pro, annotators were asked to create a selection box around every bird call they could recognize, ignoring those that were too faint or unidentifiable at a spectrogram window size of 700 points. Provided labels contain full bird calls that are boxed in time and frequency. Annotators were allowed to combine multiple consecutive calls of the same species into one bounding box label if pauses between calls were shorter than 0.5 seconds. We converted labels to eBird species codes, following the 2021 eBird taxonomy (Clements list).

Files in this collection

Audio recordings can be accessed by downloading and extracting the “soundscape_data.zip” file. Soundscape recording filenames contain a sequential file ID, site ID, recording date, and timestamp in HST. As an example, the file “UHH_001_S01_20161121_150000.flac” has sequential ID 001 and was recorded at site S01 on Nov 21st, 2016 at 15:00:00 HST. Ground truth annotations are listed in “annotations.csv” where each line specifies the corresponding filename, start and end time in seconds, low and high frequency in Hertz, and an eBird species code. These species codes can be assigned to the scientific and common name of a species with the “species.csv” file. The approximate recording location with Universal Transverse Mercator (UTM) coordinates and other metadata can be found in the “recording_location.csv” file.

Acknowledgements 

Compiling this extensive dataset was a major undertaking, and we are very thankful to the domain experts who helped to collect and manually annotate the data for this collection. Specifically, we want to thank Charlotte Forbes-Perry with the Pacific Cooperative Studies Unit, University of Hawai'i at Hawai‘i Volcanoes National Park as well as the following current and past members of the LOHE lab (in alphabetical order): Keith Burnett, Saxony Charlot, Noah Hunt, Caleb Kow, Elizabeth Lough, and Bret Mossman.

Access and permits to record soundscapes were provided by (in alphabetical order): Hakalau Forest National Wildlife Refuge, the State of Hawai‘i Department of Land and Natural Resources Division of Forestry and Wildlife, and the U.S. Fish and Wildlife Service.

We would also like to acknowledge our funding sources (in alphabetical order): The National Park Service Inventory and Monitoring Division, the National Science Foundation, and the U.S. Army Engineer Research and Development Center.

Files

annotations.csv

Files (5.8 GB)

Name Size Download all
md5:03b5550b6788734c5fe9728a1abc0ca2
3.6 MB Preview Download
md5:b8eff939d23788b123bb85600b682842
149.9 kB Preview Download
md5:7d6db82888c3faff3a40efa03f91ffda
516 Bytes Preview Download
md5:79cec7baf06770acf0ac0d519de070ca
5.8 GB Preview Download
md5:5c21df6024c41556cd334ed3d9efdf42
1.2 kB Preview Download