Published September 2023 | Version v2
Dataset Open

STeLiN-US: A Spatio-Temporally Linked Neighborhood Urban Sound Database

  • 1. NTHU, Taiwan

Description

Update: 

Introduction:

In this work, we present a novel dataset, the Spatio-temporally Linked Neighborhood Urban Sound (STeLiN-US) database. The dataset is semi-synthesized, that is, each sample is generated by leveraging diverse sets of real urban sounds with crawled information of real-world user behaviors over time.

 

Aim:

We proposed this dataset with the inspiration to equip researchers with variable surrounding sound in an environment that closely resembles realistic patterns. The proposed STeLiN-US dataset simulates the acoustic appearance of closely interconnected neighborhoods in urban areas.

  1. Possess potential in not only identifying the scenes but also predicting acoustic scenarios.
  2. This accommodates the user-centered applications, e.g., If combined with the ASR, the ASR performance can be analyzed based on the location and time more than that possible performance can be predicted beforehand based on the prediction of the scene busyness.
  3. Incorporation of scene-specific events to replicate the real surrounding environments facilitates researchers in testing trailblazing event detection systems.

 

Dataset Specification:

STeLiN-US dataset consists of 5 minutes of audio segments representing 5 acoustic scenes or microphone locations:

  1. Street
  2. Metro-Station
  3. Park
  4. School-Playground
  5. Café

Audio segment at each scene is synthesized for 15 discrete hours of the day from 7am to 9pm, equally distributed for each day of the week from Monday to Sunday. For 5 locations on 7 days with 15 discrete timestamps representing each audio segment accumulate to 525 total audio segments representing 43 hours 45 minutes of duration. We use 14 acoustic sound classes divided into event and background as below:

Events Backgrounds
Vehicle, Children Playing, Street Music, Phone Ring, School Bell, Car Horn, Bird, and Dog Bark Train, Pedestrian, Cafe Crowd, Urban Park, River, and Fountain

Sound Classes and Dataset used for the synthesis:

Sound Class Source Dataset
Vehicle IDMT Traffic
Train, Cafe Crowd, Urban Park TUT Rare Sound Events 2017
Pedestrian TAU Urban Acoustic Scenes 2020 Mobile
Children Playing, Street Music UrbanSound
Phone Ring NIGENS
School Bell, River, Fountain Freesound
Car Horn, Dog Bark UrbanSound8K
Bird ESC-50

 

Naming Convection:

[Day]_[Microphone Location/Scene]_[Time].wav e.g. “Mon_Park_3pm.wav” represent Park scene on Monday synthesized at 3pm

 

File Structure:

STeLiN-US
|    MetaData.xlsx
|    Traffic_Temporal_MetaData.xlsx
|
|____Readme
|    |    README.md
|    |    Map_Paper.png
|
|
|____Audio
|    |____Street
|    |    |    Fri_Street_1pm.wav            file naming convection: [Day]_[Microphone Location/Scene]_[Time].wav
|    |    |    Fri_Street_2am.wav
|    |    |    …
|
|    |____Metro-Station
|    |    |    Fri_Metro-Station_1pm.wav
|    |    |    Fri_Metro-Station_2pm.wav
|    |    |    …
|
|    |____Park
|    |    |    Fri_Park_1pm.wav
|    |    |    Fri_Park_2pm.wav
|    |    |    …
|
|    |____School-Playground
|    |    |    Fri_School-Playground_1pm.wav
|    |    |    Fri_School-Playground_2pm.wav
|    |    |    …
|
|    |____Cafe
|    |    |    Fri_Cafe_1pm.wav
|    |    |    Fri_Cafe_2pm.wav
|    |    |    …

 

Contact person:

This is preliminary work, and We look forward to improving the present version of the dataset. Suggestions on this are most welcome by sending your feedback to:

Files

STeLiN-US.zip

Files (4.0 GB)

Name Size Download all
md5:c020de1eeccc5ab7a94acaee4d0ac2a6
4.0 GB Preview Download

Additional details

Dates

Accepted
2023