STeLiN-US: A Spatio-Temporally Linked Neighborhood Urban Sound Database
Description
Update:
- Baseline: GitHub: STeLiN-US
- Bug Fix: a minor bug on missing brackets in some clip labels in the
MetaData.xlsx
file has been corrected. - Paper: DCASE-2023: Proceedings
Introduction:
In this work, we present a novel dataset, the Spatio-temporally Linked Neighborhood Urban Sound (STeLiN-US) database. The dataset is semi-synthesized, that is, each sample is generated by leveraging diverse sets of real urban sounds with crawled information of real-world user behaviors over time.
Aim:
We proposed this dataset with the inspiration to equip researchers with variable surrounding sound in an environment that closely resembles realistic patterns. The proposed STeLiN-US dataset simulates the acoustic appearance of closely interconnected neighborhoods in urban areas.
- Possess potential in not only identifying the scenes but also predicting acoustic scenarios.
- This accommodates the user-centered applications, e.g., If combined with the ASR, the ASR performance can be analyzed based on the location and time more than that possible performance can be predicted beforehand based on the prediction of the scene busyness.
- Incorporation of scene-specific events to replicate the real surrounding environments facilitates researchers in testing trailblazing event detection systems.
Dataset Specification:
STeLiN-US dataset consists of 5 minutes of audio segments representing 5 acoustic scenes or microphone locations:
- Street
- Metro-Station
- Park
- School-Playground
- Café
Audio segment at each scene is synthesized for 15 discrete hours of the day from 7am to 9pm, equally distributed for each day of the week from Monday to Sunday. For 5 locations on 7 days with 15 discrete timestamps representing each audio segment accumulate to 525 total audio segments representing 43 hours 45 minutes of duration. We use 14 acoustic sound classes divided into event and background as below:
Events | Backgrounds |
Vehicle, Children Playing, Street Music, Phone Ring, School Bell, Car Horn, Bird, and Dog Bark | Train, Pedestrian, Cafe Crowd, Urban Park, River, and Fountain |
Sound Classes and Dataset used for the synthesis:
Sound Class | Source Dataset |
Vehicle | IDMT Traffic |
Train, Cafe Crowd, Urban Park | TUT Rare Sound Events 2017 |
Pedestrian | TAU Urban Acoustic Scenes 2020 Mobile |
Children Playing, Street Music | UrbanSound |
Phone Ring | NIGENS |
School Bell, River, Fountain | Freesound |
Car Horn, Dog Bark | UrbanSound8K |
Bird | ESC-50 |
Naming Convection:
[Day]_[Microphone Location/Scene]_[Time].wav
e.g. “Mon_Park_3pm.wav” represent Park scene on Monday synthesized at 3pm
File Structure:
STeLiN-US
| MetaData.xlsx
| Traffic_Temporal_MetaData.xlsx
|
|____Readme
| | README.md
| | Map_Paper.png
|
|
|____Audio
| |____Street
| | | Fri_Street_1pm.wav file naming convection: [Day]_[Microphone Location/Scene]_[Time].wav
| | | Fri_Street_2am.wav
| | | …
|
| |____Metro-Station
| | | Fri_Metro-Station_1pm.wav
| | | Fri_Metro-Station_2pm.wav
| | | …
|
| |____Park
| | | Fri_Park_1pm.wav
| | | Fri_Park_2pm.wav
| | | …
|
| |____School-Playground
| | | Fri_School-Playground_1pm.wav
| | | Fri_School-Playground_2pm.wav
| | | …
|
| |____Cafe
| | | Fri_Cafe_1pm.wav
| | | Fri_Cafe_2pm.wav
| | | …
Contact person:
This is preliminary work, and We look forward to improving the present version of the dataset. Suggestions on this are most welcome by sending your feedback to:
- Snehit Chunarkar: snehitc@gmail.com
Files
STeLiN-US.zip
Files
(4.0 GB)
Name | Size | Download all |
---|---|---|
md5:c020de1eeccc5ab7a94acaee4d0ac2a6
|
4.0 GB | Preview Download |
Additional details
Dates
- Accepted
-
2023