Audio tagging of avian dawn chorus recordings in California, Oregon, and Washington
Creators
- 1. Department of Forest Ecosystems and Society, Oregon State University, Corvallis, OR, United States ; Pacific Northwest Research Station, USDA Forest Service, Corvallis, OR, United States
- 2. Google DeepMind
- 3. Conservation Metrics, Inc., Santa Cruz, CA, United States
- 4. Carlson College of Veterinary Medicine, Oregon State University, Corvallis, OR, United States
- 5. Department of Fisheries, Wildlife, and Conservation Sciences, Oregon State University, Corvallis, OR, United States
- 6. Pacific Northwest Research Station, USDA Forest Service, Corvallis, OR, United States
- 7. Department of Forest Ecosystems and Society, Oregon State University, Corvallis, OR, United States
- 8. Pacific Northwest Research Station, USDA Forest Service, Corvallis, OR, United States; Department of Forest Ecosystems and Society, Oregon State University, Corvallis, OR, United States; Department of Fisheries, Wildlife, and Conservation Sciences, Oregon State University, Corvallis, OR, United States
Description
General Summary
This acoustic data collection includes 1,575 5-minute soundscape recordings randomly selected from passive acoustic recordings made at 525 sites during 2022 on federally managed lands in western California, Oregon, and Washington, USA. We fully labeled 141 recordings (11.75 hrs) with 39,717 annotations for 118 sound types, including 58 avian species, two mammalian species, six aggregated biotic sounds, and eight non-biotic sound types. An additional 215 recordings were partially annotated with 1,466 annotations. The remaining unlabeled recordings have been included to facilitate novel research applications and methodological evaluations. Beyond the labeled soundscape recordings, we have included township and range identifications and 38 environmental covariates for each recording location.
Data Collection
Lesmeister et al. (2021) collected passive acoustic recordings during 2022 in support of long-term monitoring of federally threatened northern spotted owl (Strix occidentalis caurina) populations under the Northwest Forest Plan Effective Monitoring Program (U. S. Fish and Wildlife Service 1990, U. S. Department of Agriculture and U. S. Department of the Interior 1994). These data were collected at 643 hexagons that were randomly selected from a tessellation of 5 km2 hexagons covering the entire range of the northern spotted owl (Northern California, Oregon, Washington) under a selective constraint that hexagons contain ≥ 50 % forest-capable lands (def. forested lands or lands capable of developing closed-canopy forests) and be ≥ 25% federal ownership (Davis et al., 2011).
Each hexagon was sampled by four Song Meter 4 (SM4) acoustic recording units (Wildlife Acoustics, Maynard, MA) deployed in a standardized spatial arrangement, such that recorders on a site were placed ≥ 500 m apart and were ≥ 200 m from the edge of the sampling hexagon boundary. Recorders were mounted to small trees (15 – 20 cm diameter at breast height) approximately 1.5 m above the ground and were placed on mid-to-upper slopes and ≥ 50 m from roads, trails, and streams. The SM4 devices each have two built-in omnidirectional microphones with a signal-to-noise ratio of 80 dB, typical at 1 kHz, and a recording bandwidth of 20 Hz – 48 kHz. Each device recorded ~11 hours of audio daily for six weeks from March to August at a sampling rate of 32 kHz. The daily recording schedule included a 4-hour window from two hours before sunrise to two hours after sunrise, a 4-hour window from one hour before sunset to 3 hours after sunset, and 10-minute recordings outside the two longer recording blocks at the start of every hour.
Data Sampling
The goal of this project was to develop a tagged audio dataset (hereafter project dataset) focused on the avian dawn chorus, which is an ecologically important period for the study of avian behavior (McNamara et al. 1987, Staicer et al. 1996, Zhang et al. 2015) and monitoring avian biodiversity (Bibby et al. 2000), but remains a challenging problem for acoustic classification systems (Duan et al. 2013, Stowell 2022). Passive acoustic monitoring on our sites occurs throughout the day. We filtered the full dataset to recordings collected between May and August during the hour immediately after sunrise. From the recordings meeting our filtering criteria, we randomly selected three 5-minute files from each site, which were assigned ordinal labels 'A, 'B,' or 'C.' The final project dataset comprised 131.25 hours of acoustic data.
Annotation Protocol
We randomly selected 141 sites from the project dataset and fully annotated each recording at a 2-second resolution. We applied labels to each 2-second window of the selected recordings following a predefined sound phonology library (available in the 'metadata.tsv' file), which concatenated the 2021 eBird taxonomy codes (Clements list; Clements et al. 2022) with standardized sonotype codes that incremented depending on the species repertoire (i.e., 'call_1,' 'song_1,' 'drum_1'). For example, 'herthr_song_1' is the label for Hermit Thrush, song_1. Unknown signals were labeled 'unknown,' and clips with no biotic signals (or noise classes of interest documented in metadata.tsv) were labeled 'empty.' Windows were labeled 'complete' and considered fully annotated when every signal was assigned an annotation. Files were deemed fully annotated when every 2-second window contained the 'complete' label.
Environmental Covariates
Sampling locations will not be published to afford protections for Federally Threatened or Endangered species which may occur on our sites. However, we provide the State, Township, and Range for each sampling location along with the site-specific values for 38 forest structure, topographic, and climatic environmental covariates developed by the Landscape Ecology, Modeling, Mapping, and Analysis group in the Pacific Northwest (https://lemma.forestry.oregonstate.edu/data; Ohmann and Gregory 2002). State, Township, and Range values are sufficient to explore geographic variation in species- or community-specific call and song phenology and the extracted environmental covariates may provide useful contextual information for novel machine-learning developments (Liu et al. 2018).
Description of Data Format
The fully annotated audio files can be accessed by downloading and extracting "annotated_recordings.zip." Partially annotated and non-annotated audio files can be accessed by downloading and extracting "additional_recordings_part_1.zip" or "additional_recordings_part_2.zip." Acoustic file names contain site and replicate indicators, such that file "Site_001_Rep_A.wav' was recorded on site 1 and is the A replicate random draw from the available set of dawn chorus recordings. The site and replicate numbers link to additional recording information in "files.tsv," annotations in "annotations.tsv" and "partial_annotations.tsv," as well as site and replicate specific environmental characteristics in "environmental_characteristics.tsv."
Metadata describing sound classes and environmental characteristics can be found in "metadata.tsv," and "environmental_characteristics_metadata.tsv."
Acknowledgments
Acoustic data collection was funded and collected by the US Forest Service and the US Bureau of Land Management. Annotation work was funded by Google. We would also like to thank the many biologists that collected and processed the data compiled here. The use of trade or firm names in this publication is for reader information and does not imply endorsement by the U.S. Government of any product or service.
Files
additional_recordings_part_1.zip
Files
(20.3 GB)
Name | Size | Download all |
---|---|---|
md5:ce2ec49cedf38c8cec63f613f28ff948
|
2.4 MB | Download |
md5:a86ade9b105d897b3389fa7ae1df23d3
|
148.3 kB | Download |
md5:af069984901546372f56b125a1e0797d
|
1.6 GB | Preview Download |
md5:96821dd3b159e8ff2c6896d709fb81ef
|
1.7 GB | Preview Download |
md5:d9832ecbbeeddd19f41a24c5c50ee54c
|
900.4 MB | Preview Download |
md5:e5dfc6f22a43bad2e1406d95139f649c
|
1.7 GB | Preview Download |
md5:a30b681ece676476ba6c00b815a22aa3
|
1.7 GB | Preview Download |
md5:66ec2ae73ad0c8f222d81dbc265c7c84
|
1.8 GB | Preview Download |
md5:6e6e6eacc148e817b88fa33c566a44c7
|
1.8 GB | Preview Download |
md5:e53a981747a37e6054eb8fe1c5967fef
|
1.9 GB | Preview Download |
md5:38111578f406c10dd7ea60c14addb1c6
|
1.8 GB | Preview Download |
md5:1f8ae7d5a822729d5277c41b366820a5
|
1.7 GB | Preview Download |
md5:a32aa803492fbde42a11dec86578b1b3
|
1.8 GB | Preview Download |
md5:e74ff21f756f0f5deca64133fcc0e021
|
1.8 GB | Preview Download |
md5:a476060d06c3682c6d7fa68696f7245f
|
30.4 kB | Download |
md5:aae7327ddbb5f799e557ca610a93f763
|
1.3 MB | Download |
md5:b217339ed2548e080ab4d66dd5217c0b
|
65.3 kB | Preview Download |
md5:b3ba635fd215c0ddaf478da054d04139
|
353.9 kB | Download |
md5:58be795429bbf94df9481ff3a5fde4ee
|
6.8 kB | Download |
md5:dc956d94a1119a7f82b103b7b8638a60
|
290.2 kB | Download |
Additional details
References
- Bibby, C. J. (2000). Bird census techniques (2nd ed.). Academic Press.
- Clements, J. F., T. S. Schulenberg, M. J. Iliff, S. M. Billerman, T. A. Fredericks, J. A. Gerbracht, D. Lepage, B. L. Sullivan, and C. L. Wood. 2021. The eBird/Clements checklist of Birds of the World: v2021. Downloaded from https://www.birds.cornell.edu/clementschecklist/download/
- Davis, R.J., Dugger, K.M., Mohoric, S., Evers, L., and Aney, W.C. (2011). Northwest Forest Plan—the first 15 Years (1994-2008): status and trends of Northern Spotted Owl populations and habitat. (Gen. Tech. Rep. No. PNW-GTR-850). U.S. Department of Agriculture, Forest Service, Pacific Northwest Research Station, Portland, OR
- Duan, S., Zhang, J., Roe, P., Wimmer, J., Dong, X., Truskinger, A., and Towsey, M. (2013). Timed Probabilistic Automaton: A bridge between Raven and Song Scope for automatic species recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 27(2), 1519–1524. https://doi.org/10.1609/aaai.v27i2.18993
- Landscape Ecology Modeling, Mapping, and Analysis (LEMMA) Team. (2020). Gradient Nearest Neighbor (GNN) raster dataset (version 2020.01). Modeled forest vegetation data using direct gradient analysis and nearest neighbor imputation. Retrieved from https://lemma.forestry.oregonstate.edu/data
- Lesmeister, D. B., Appel, C. L., Davis, R. J., Yackulic, C. B., and Ruff, Z. J. (2021). Simulating the effort necessary to detect changes in Northern Spotted Owl (Strix occidentalis caurina) populations using passive acoustic monitoring (Research Paper PNW-RP-618; p. 55). U.S. Department of Agriculture, Forest Service, Pacific Northwest Research Station.
- Liu, J., Zhang, Z., and Razavian, N. (2018). Deep EHR: Chronic disease prediction using medical Notes, in: Proceedings of the 3rd Machine Learning for Healthcare Conference, Proceedings of Machine Learning Research. Proceedings of Machine Learning Research, pp. 440–464.
- McNamara, J. M., Mace, R. H., and Houston, A. I. (1987). Optimal daily routines of singing and foraging in a bird singing to attract a mate. Behavioral Ecology and Sociobiology, 20(6), 399–405. https://doi.org/10.1007/BF00302982
- Ohmann, J. L., and Gregory, M. J. (2002). Predictive mapping of forest composition and structure with direct gradient analysis and nearest-neighbor imputation in coastal Oregon, U.S.A. Canadian Journal of Forest Research, 32, 725–741.
- Staicer, C. A., Spector, D. A., and Horn, A. G. (1996). The dawn chorus and other diel patterns in acoustic signaling. In D. E. Kroodsma & E. H. Miller (Eds.), Ecology and evolution of acoustic communication in birds. Cornell University Press.
- Stowell, D. (2022) Computational bioacoustics with deep learning: a review and roadmap. Peerj, 10:e13152. DOI: 10.7717/peerj.13152
- U. S. Department of Agriculture, and U. S. Department of the Interior. (1994). Northwest Forest Plan - Record of Decision for amendments for Forest Service and Bureau of Land Management planning documents within the range of the Northern Spotted Owl.
- U. S. Fish and Wildlife Service. (1990). 50 CFR part 17 endangered and threatened wildlife and plants; determination of threatened status for northern spotted owl; final rule. Federal Register 55, 26114–26194.
- Zhang, V. Y., Celis-Murillo, A., and Ward, M. P. (2016). Conveying information with one song type: Changes in dawn song performance correspond to different female breeding stages. Bioacoustics, 25(1), 19–28. https://doi.org/10.1080/09524622.2015.1076348