Published April 5, 2024 | Version 1.0.0
Video/Audio Open

Labelled acoustic dataset of roding Eurasian Woodcock (Scolopax rusticola)

  • 1. ROR icon Forstliche Versuchs- und Forschungsanstalt Baden-Württemberg
  • 1. ROR icon Forstliche Versuchs- und Forschungsanstalt Baden-Württemberg
  • 2. ROR icon University of Freiburg

Description

This dataset contains manually labelled audio data of roding Eurasian Woodcock  (Scolopax rusticola). 

Description

Bioacoustic surveys of roding Eurasian Woodcock were conducted in Baden-Württemberg, Germany in May and June in 2020 and 2021. The audio data of this collection was used for the evaluation of BirdNET as a means for the automated analysis of large quantities of audio data. The original dataset consisted of 12.236 minutes of recording, which were reviewed manually. Each call element of a male roding Woodcock (i.e. croak, whistle, chasing male) was annotated. Individual call elements were subsequently clustered into so called roding events, which are ecologically more meaningful. BirdNET was then tested against this manually labelled dataset.

The dataset uploaded to zenodo contains:

  • audio data of 2545 woodcock call element selections with a duration of 145 minutes
  • audio data of 782 aggregated woodcock roding events with a duration of 115 minutes 
  • selection tables for call elements and roding events
  • associated metadata

Audio information in between roding events (i.e. non woodcock audio) ist not included due to data privacy reasons (see below). 

Selections

Woodcock call elements were manually selected/annotated in Raven Pro with bounding boxes. For this dataset, all selections with a duration of less than 3 seconds were extended symmetrically until 3 seconds were reached. This may result in overlapping selections in the case of croaks that are directly followed by a whistle. Signals at the beginning or end of these selections may thus be included twice.

Roding events

A roding event was defined as a continuous series of Woodcock call elements with a maximum gap of six seconds between consecutive elements. Each event can be interpreted as a roding bird that passes by the recording location, similar to a typical woodcock roding survey conducted by a human observer. Roding events were not created with the extended 3 seconds clips described above, but with the original bounding box selections drawn in Raven Pro.

Audio files

  • selections.zip: each wav-file contains a single selections. Filenames correspond to the column selec in the table selections.csv
  • events.zip: each wav-file contains a single roding event, typically consisting of multiple call elements (croaks and/or whistles). In the case of faint signals of distant birds, roding events may consist of a single call element only. Filenames correspond to the column event.id in the table events.csv.

Data collection

All wav-files in this dataset originate from audio files that were recorded with autonomous recording units of the type AudioMoth. ARUs were housed in  custom made waterproof casings (See details and files for 3D-printing: https://www.thingiverse.com/thing:6428228). ARUs were programmed to record continuously for 2 hours during dusk and were placed at edges of forest clearings. The devices were mounted to tree trunks at a height of approximately 1.5m above ground. 

Metadata files

filename content
sites.csv

contains locations of the recording sites. Since exact recording locations can not be made public, only recording sites (= cells of the 1km² UTM-grid) are provided. CRS: EPSG - 25832, ETRS89 / UTM 32N 

Data source of the underlying ETRS89 UTM 32N grid: https://gdz.bkg.bund.de/index.php/default/digitale-geodaten/nicht-administrative-gebietseinheiten/geographische-gitter-fur-deutschland-in-utm-projektion-geogitter-national.html

columns

site.id = unique id of recording sites,

cellcode = official cellcode of the 1km²-UTM-grid

elevation = mean elevation a.s.l.

x.centroid = x-coordinate of centroid (EPSG: 25832)

y.centroid = y-coordinate of centroid (EPSG: 25832)

wkt.geometry = polygon geometry of the grid cell

arus.csv

metadata of the recording hardware

 

columns

aru.id = unique id of recording device

type = recorder type

manufacturer = manufacturer of recording hardware

hardware.version = hardware version of the recording device

acquisition.date = date the device was purchased (for reasons of microphone degradation)

deploys.csv

information on recorder deployment, includes aru settings, location, recording times 

 

columns

deploy.id = unique id of recorder deployment

aru.id = unique id of deployed aru

start.date = date the aru was deployed in the field (YYYY-MM-DD)

end.date = date the aru was collected (YYYY-MM-DD)

firmware = firmware version used in this deployment

rec.periods = number of daily recording periods (corresponds to start.rec1, start.rec2 ...)

sample.rate = sample rate in kHz

gain = gain setting

sleep.duration = duration off stand-by phases in seconds, when set on a sleep/record-cycle

rec.duration = duration of each recording in seconds, when set on a sleep/record-cycle

start.rec1 = start of first recording period (UTC, hh:mm:ss)

end.rec1 = end of first recording period (UTC, hh:mm:ss)

start.rec2 = start of secondrecording period (UTC, hh:mm:ss)

end.rec2 = end of second recording period (UTC, hh:mm:ss)

site.id = unique id of recording site

recordings.csv

metadata of the audio files from which the roding events originate

 

 columns

recording.id = unique id of the recording

deploy.id = unique id of aru deployment, during which the recording was made

date = date on which the recording was made (YYYY-MM-DD)

time = time of day at which the recording started (UTC, hh:mm:ss)

duration = duration in seconds

sampler.rate = sample rate in kHz

channels = number of channels

bits = bit depth

samples = number of audio samples

gain = gain setting of the aru

voltage = battery voltage of the aru during recording

temperature = ambient temperature during recording 

reviewer = anonymous id of staff who reviewed the file and annotated calls

 

selections.csv

manually labelled woodcock call elements (i.e. croaks, whistles, chases). Short selections were extended to 3 seconds by symmetrically adding time before and after the original selection. In the format of raven pro selection tables.

 

 columns

selec = unique id of the selection. Corresponds to the filename of the wav-files in the archive selections.zip

deploy.id = unique id of the aru deployment during which the roding event was recorded

channel = audio channel

start = start of the event in seconds from the start of the recording

end = end of the event in seconds from the start of the recording

bottom.freq = bottom frequency of the annotation bounding box

top.frequency = top frequency of the annotation bounding box

species.code = species code as used by BirdNET

common.name = English common name as used by BirdNET

annotation = contains annotations of call elements that are pooled in the roding event. Thus typcally equal to the number of annotated call element 

recording.id = id of the recording this roding eventoriginates from

events.csv

aggregated roding events consisting of contiuous sequences of manually labelled call elements. In the format of raven pro selection tables

 

 columns

event.id = unique id of roding event. Corresponds to the filename of the wav-files in the archive events.zip 

channel = audio channel

start = start of the event in seconds from the start of the recording

end = end of the event in seconds from the start of the recording

bottom.freq = bottom frequency of the annotation bounding box

top.frequency = top frequency of the annotation bounding box

species.code = species code as used by BirdNET

common.name = English common name as used by BirdNET

annotation = contains annotations of call elements that are pooled in the roding event. Thus typcally equal to the number of annotated call element 

recording.id = id of the recording this roding eventoriginates from

deploy.id = unique id of the aru deployment during which the roding event was recorded

removed_audio_files.txt selection ids and event ids of audio files that were deleted because they included voices. Their metadata is still included in the files described above

 

Data privacy

Selections and roding events were checked for human voices and audio information was removed, in case it contained any. Audio segments that did not contain woodcock calls were not completely checked for human voices  and can thus not be made available.

Files

selections.csv

Files (1.0 GB)

Name Size Download all
md5:fced44eb9eb0da0410484e3d7702aca1
2.2 kB Preview Download
md5:96e42ef0d4707daeca0ca88a8c1cccb9
6.3 kB Preview Download
md5:4c25206f6551798de7a05474a099791b
102.7 kB Preview Download
md5:6d55ceffa54c12021b252d70a84e418d
33.6 kB Preview Download
md5:3e20de844ec237ef702900c5203112c2
207 Bytes Preview Download
md5:09dbf499229f30be689e3e947f54d2e6
442.1 MB Preview Download
md5:5202477c5432f06b4675f278e66e2b35
395.8 kB Preview Download
md5:3ae98e81f5002ebb5f40525b257a3e8c
559.2 MB Preview Download
md5:156ba49ab954d33ffe30c48b02b9de5c
12.6 kB Preview Download

Additional details

Funding

Ministerium für Ländlichen Raum und Verbraucherschutz Baden-Württemberg

Dates

Collected
2020/2021
data collection
Submitted
2024-04
submission to zenodo