---
license: cc0-1.0
language:
- en
pretty_name: The Wilds Bioacoustics Monitors
task_categories:
- audio-classification
tags:
- biology
- audio
- animals
- conservation
- bioacoustics
- wildlife
- soundscape
- ecology
size_categories:
- 100<n<1K
---

# Dataset Card for The Wilds Bioacoustics Monitors

This dataset contains passive acoustic recordings collected at [The Wilds safari park](https://www.thewilds.org/) in Ohio during Summer 2025. 
Recorders captured ambient soundscapes to support ecological monitoring, animal behavior analysis, and acoustic biodiversity modeling.

## Dataset Details

### Dataset Description

This dataset was created to support multimodal wildlife monitoring research using passive acoustic monitoring. Bioacoustic data were collected using Wildlife Acoustics Song Meter devices deployed across four field sites at The Wilds. The recordings capture natural soundscapes including wildlife vocalizations, environmental sounds, and ambient audio that can be used for species detection, behavioral analysis, and biodiversity assessment.

### Supported Tasks and Leaderboards

- **Audio Classification:** Species identification from acoustic recordings
- **Sound Event Detection:** Detection and localization of animal vocalizations
- **Biodiversity Assessment:** Acoustic diversity indices and community analysis
- **Behavioral Analysis:** Temporal activity patterns and acoustic behavior studies
- **Soundscape Ecology:** Environmental audio analysis and habitat characterization

[No benchmarks currently available]

## Dataset Structure

The dataset is organized hierarchically by site and deployment session:

```
/dataset/
    bioacoustic.txt
    The_Wilds_Bioacoustic_Log2025-06-30_21_54_59.csv
    The_Wilds_Bioacoustic_Log2025-07-04_20_18_38.csv
    TW05-SM01/
        metadata.md
        SD01_20250630_20250703/
            SM001_20250630_195900.wav
            SM001_20250630_200402.wav
            SM001_20250630_200902.wav
            ...
            SM001_20250703_064902.wav
            SM001_20250703_065402.wav
            SM001_20250703_065902.wav
    TW06-SM03/
        metadata.md
        SD03_20250630_20250703/
            SM03_20250630_120000.wav
            SM03_20250630_130000.wav
            SM03_20250630_140000.wav
            ...
            SM03_20250703_140000.wav
            SM03_20250703_150000.wav
            SM03_20250703_160000.wav
    TW07-SM02/
        metadata.md
        SD02_20250630_20250703/
            SM002_20250630_195900.wav
            SM002_20250630_205902.wav
            SM002_20250701_050300.wav
            ...
            SM002_20250702_205902.wav
            SM002_20250703_050400.wav
            SM002_20250703_060402.wav
    TW08-SM04/
        metadata.md
        SD04_20250630_20250703/
            SM04_20250630_120000.wav
            SM04_20250630_130000.wav
            SM04_20250630_140000.wav
            ...
            SM04_20250703_150000.wav
            SM04_20250703_160000.wav
            SM04_20250703_170000.wav
```

### Data Instances

Each bioacoustic deployment folder contains:
- **Audio files:** .wav format recordings captured by scheduled recording
- **Metadata file:** `metadata.md` with deployment information and recorder settings

**File Counts by Recorder:**
- **TW05-SM01:** 144 audio files (.wav recordings)
- **TW06-SM03:** 77 audio files (.wav recordings)
- **TW07-SM02:** 12 audio files (.wav recordings)
- **TW08-SM04:** 78 audio files (.wav recordings)

**Audio File Specifications:**
- **Format:** .wav (uncompressed)
- **Channels:** Mono
- **Bit depth:** 16-bit
- **Sample rate:** 48 kHz
- **Duration:** Variable based on recording schedule

**Filename Conventions:**
- **SM001/SM03/SM04 series:** SM0##_YYYYMMDD_HHMMSS.wav (TW05-SM01, TW06-SM03, TW08-SM04)
- **SM002 series:** SM002_YYYYMMDD_HHMMSS.wav (TW07-SM02)

**Total Dataset Size:** 311 audio files across all bioacoustic monitor deployments.

Each .wav file is a field recording captured according to programmed recording schedules. File names include timestamps indicating the start time of each recording session.

### Data Fields

**metadata.md** (found in each recorder deployment folder):
- **Recorder ID:** Unique device identifier (SM01, SM02, SM03, SM04)
- **Device Model:** Song Meter model name (e.g., Song Meter Micro 2)
- **Device Serial Number:** Manufacturer-assigned serial number
- **Site ID:** Location code where deployed (TW05, TW06, TW07, TW08)
- **Deployment Location Description:** Text description of exact location and surroundings
- **GPS Coordinates:** Latitude and longitude in decimal format
- **Deployment Date and Time:** Recorder deployment timestamp (YYYY-MM-DD HH:MM format)
- **Retrieval Date and Time:** Recorder retrieval timestamp (YYYY-MM-DD HH:MM format)
- **Orientation / Microphone Facing:** Direction and environmental considerations (e.g., "East, away from wind and road")
- **Mounting Height:** Approximate height of microphone from ground in meters
- **Recording Schedule Preset:** Schedule or settings used for recording (e.g., "1 hour at sunrise and sunset")
- **Time Zone Set on Device:** Local time zone configured (e.g., "USA Eastern (UTC-5)")
- **Maintenance Notes:** Issues, configuration changes, or deviations from standard settings
- **Observer:** Name or initials of person completing metadata

**CSV Log Files:**
- `The_Wilds_Bioacoustic_Log2025-06-30_21_54_59.csv`: Deployment log from June 30, 2025
- `The_Wilds_Bioacoustic_Log2025-07-04_20_18_38.csv`: Retrieval log from July 4, 2025

### Data Splits

This dataset has no predefined training/validation/test splits. Data are organized by site (TW05-TW08) and deployment session. Users may create their own splits based on:
- **Temporal splits:** Using recording timestamps across the deployment period
- **Spatial splits:** Using different site locations (TW05, TW06, TW07, TW08)
- **Recorder-based splits:** Using different Song Meter devices (SM01, SM02, SM03, SM04)

Recommended approach depends on modeling goals and research questions.

## Dataset Creation

### Curation Rationale

This dataset supports biodiversity monitoring, behavioral ecology research, and the development of automated species detection and classification models from passive acoustic recordings. Bioacoustic monitoring provides complementary data to camera trap surveys and enables detection of cryptic or nocturnal species that may be missed by visual methods.

### Source Data

#### Data Collection and Processing

Recordings were collected at The Wilds safari park during summer 2025 using Wildlife Acoustics Song Meter devices. Four recorders (SM01-SM04) were strategically deployed at sites TW05-TW08 from June 30 to July 3, 2025. 

Devices were programmed for scheduled recordings with different sampling strategies across sites. Recorders were mounted on trees or posts at appropriate heights and orientations to minimize wind noise and maximize acoustic detection. Upon retrieval, audio files were organized by deployment session and basic metadata were recorded. No audio processing, filtering, or annotation was applied to preserve the raw acoustic data.


### Annotations

#### Annotation process

No species identification or acoustic annotations are currently provided with this initial dataset release. Manual and AI-assisted labeling efforts for species detection, vocalization classification, and acoustic event annotation are planned for future versions.

#### Who are the annotators?

N/A - annotations will be added in future releases

### Personal and Sensitive Information

The dataset includes GPS coordinates within The Wilds, a public conservation park in Ohio. Some recordings may contain vocalizations from endangered or sensitive species, though specific species identifications are not currently provided. Spatial coordinates have not been redacted as they fall within a public conservation area.

## Considerations for Using the Data

Recordings exhibit natural variation in quality due to weather conditions, background noise levels, wind, and varying animal activity patterns. The recording schedules differed across sites, resulting in different temporal sampling strategies. Some recordings may contain environmental noise, equipment sounds, or human activity that should be considered during analysis.

### Bias, Risks, and Limitations

- **Sampling bias:** Recorders were deployed strategically at ecological hotspots rather than randomly, potentially overrepresenting certain acoustic environments
- **Temporal limitations:** Data represent only a 3-4 day deployment period in summer 2025, limiting seasonal and long-term temporal representation
- **Detection bias:** Passive acoustic monitoring may miss quiet vocalizations or species that vocalize outside the recorded frequency range
- **Spatial bias:** Limited to four sites within The Wilds, may not represent broader regional acoustic diversity
- **Schedule variation:** Different recording schedules across sites create uneven temporal sampling
- **Technical limitations:** Variable audio quality due to different environmental conditions and potential equipment issues

### Recommendations

Users should consider the ecological and methodological context when analyzing this data. The dataset is well-suited for proof-of-concept studies, algorithm development, and preliminary acoustic ecology analyses. For robust ecological conclusions, combination with additional seasonal data, broader spatial sampling, and longer deployment periods would be beneficial.

Future dataset releases may include additional deployment periods, seasonal data, and species annotations to address current limitations.

## Licensing Information

This dataset is dedicated to the public domain under a [CC0 license](https://creativecommons.org/publicdomain/zero/1.0/) for the benefit of scientific pursuits. Users are encouraged to cite the dataset and acknowledge contributors when using this data in research or applications.


## Glossary 

- **Passive Acoustic Monitoring:** Non-invasive method of recording environmental sounds to study wildlife
- **Song Meter:** Brand of automated acoustic recording device by Wildlife Acoustics
- **Deployment:** Period when a bioacoustic recorder is installed and actively recording
- **Site ID:** Geographic location identifier (TW05-TW08 in this dataset)
- **Recorder ID:** Individual device identifier (SM01-SM04 in this dataset)
- **Soundscape:** Acoustic environment including all sounds in a given area
