Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.
Published October 15, 2021 | Version 1.0.0
Dataset Open

SONYC-Backgrounds: a collection of urban background recordings from an acoustic sensor network

  • 1. New York University
  • 2. New Jersey Institute of Technology
  • 3. Apple Inc.

Description

Created by

Aurora Cramer (1, 2), Mark Cartwright (3), Fatemeh Pishdadian (4), Juan Pablo Bello (1,2,5,6)

    1. Music and Audio Research Lab, New York University
    2. Department of Electrical and Computer Engineering, New York University
    3. Department of Informatics, New Jersey Institute of Technology
    4. Interactive Audio Lab, Northwestern University
    5. Center for Urban Science and Progress, New York University
    6. Department of Computer Science and Engineering, New York University


Publication

If you use this data in your work, please cite the following paper, which introduced this dataset:

[1] Cramer, A., Cartwright, M., Pishdadian, F., and Bello, J.P. Weakly Supervised Source-Specific Sound Level Estimation in Noisy Soundscapes. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021. [pdf]


Description

SONYC-Backgrounds is an open dataset of recordings of urban background noise obtained from the SONYC acoustic sensor network [2]. This dataset was developed with the goal of synthesizing soundscapes with a diverse set of realistic sounding background activity, for use in developing and evaluating machine listening systems in urban settings.


Data acquisition

The provided audio has been acquired using the SONYC acoustic sensor network for urban noise pollution monitoring [2]. Over 50 different sensors have been deployed in New York City. All recordings are 10 seconds and were recorded with identical microphones at identical gain settings.


Recording selection

From the large collection of audio recordings acquired in 2017, we obtain a much smaller subset of likely background recordings. We first process the dataset using a sensor fault detector to filter out recordings with artifacts caused by hardware failures in the sensors. The sensor fault detector is a random forest, trained with a small collection of audio examples using active learning [3].

We then determine if a recording is background or not using an urban sound classifier trained to detect the presence of sources of interest to urban noise pollution monitoring [4, 5]. We use the classifier to find recordings that *do not* contain the sound classes of interest. The classifier model is a multi-layer perception with two hidden layers, which takes as input an OpenL3 embedding [6]  for a 1 s clip of audio and produces multi-label prediction probabilities for each class. This model is nearly identical to the one used for the DCASE 2019 Challenge Urban Sound Tagging Task baseline model, aside from the addition of an extra hidden layer.

Predictions for entire recordings are obtained by max-pooling the predictions for each class across time. A recording is considered background if the probabilities of the target classes fall below their respective detection thresholds, i.e. no target classes are detected. The classifier was trained on the SONYC-UST v1 dataset [4], and the detection thresholds for each class were tuned to correspond to 70% negative recall (true negative rate) on the test set to increase the likelihood that recordings are background.

After this selection process, we obtain 441 background clips.

 

Metadata

To maintain privacy, the recordings in this release have been distributed in time and location, and recording times have been quantized to the hour. Sensor IDs are consistent with those SONYC-UST dataset [4]. The corresponding location of the sensors can be found in the SONYC-UST v2 dataset [5], though these locations have been mapped to the "block" level to maintain privacy. See the DCASE 2020 Challenge Urban Sound Tagging with Spatiotemporal Context Task page for more information on the metadata.


Data splits

The dataset is partitioned into a train/valid/test split of roughly 60/20/20, using a simple greedy method to assign sensors to subsets.


Files

The dataset directory contains the directories `train`, `valid`, and `test` for each of the respective data subsets. Each directory contains recordings, with the file format: `<sensor-id>_<year>-<month>-<day>_<hour>_<instance-num>.wav`, where `<instance-num>` is used to distinguish recordings from the same sensor occurring during the same hour. Aside from `<year>`, each of these fields in the format are lead zero padded to two places (i.e. `printf` format `"%02d"`).


Conditions of use

Dataset created by Aurora Cramer, Mark Cartwright, Fatemeh Pishdadian, and Juan Pablo Bello.

The SONYC-Backgrounds dataset is offered free of charge under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license: https://creativecommons.org/licenses/by/4.0/

The dataset and its contents are made available on an “as is” basis and without warranties of any kind, including without limitation satisfactory quality and conformity, merchantability, fitness for a particular purpose, accuracy or completeness, or absence of errors. Subject to any liability that may not be excluded or limited by law, New York University is not liable for, and expressly excludes all liability for, loss or damage however and whenever caused to anyone by any use of the SONYC-Backgrounds dataset or any part of it.

 

Contact

If you have any questions, comments, or concerns, please direct correspondence to Aurora Cramer (aurora (dot) linh (dot) cramer (at) gmail (dot) com).


References and Links

[1] Cramer, A., Cartwright, M., Pishdadian, F., and Bello, J.P. Weakly Supervised Source-Specific Sound Level Estimation in Noisy Soundscapes. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021.

[2] Bello, J. P., Silva, C., Nov, O., Dubois, R. L., Arora, A., Salamon, J., C. Mydlarz, and Doraiswamy, H. (2019). Sonyc: A system for monitoring, analyzing, and mitigating urban noise pollution. Communications of the ACM, 62(2), 68-77.

[3] Wang, Y., Mendez, A.E.M., Cartwright, M., and Bello, J.P. Active Learning for Efficient Audio Annotation and Classification with a Large Amount of Unlabeled Data. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019.

[4] Cartwright, M., Mendez, A.E.M., Cramer, A., Lostanlen, V., Dove, G., Wu, H., Salamon, J., Nov, O., and Bello, J.P. SONYC Urban Sound Tagging (SONYC-UST): A Multilabel Dataset from an Urban Acoustic Sensor Network. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE) , 2019.

[5] Cartwright, M., Cramer, A., Mendez, A.E.M., Wang, Y., Wu, H., Lostanlen, V., Fuentes, M., Dove, G., Mydlarz, C., Salamon, J., Nov, O., and Bello, J.P. SONYC-UST-V2: An Urban Sound Tagging Dataset with Spatiotemporal Context. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2020.

[6] Look, Listen and Learn More: Design Choices for Deep Audio Embeddings
Cramer, A., Wu, H.-H., Salamon J., and Bello. J.P. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019.


Acknowledgements

We would like to thank all those involved in the SONYC project. This work is partially supported by National Science Foundation award 1633259 and award 1544753.

 

Notes

Acknowledgements We would like to thank all those involved in the SONYC project (https://wp.nyu.edu/sonyc/people/). This work is partially supported by National Science Foundation award 1633259 (https://www.nsf.gov/awardsearch/showAward?AWD_ID=1633259) and award 1544753 (https://www.nsf.gov/awardsearch/showAward?AWD_ID=1544753).

Files

Files (307.2 MB)

Name Size Download all
md5:62f9f8549134afb1b831aae08eb19a4b
307.2 MB Download

Additional details

Related works

Is published in
Preprint: arXiv:2105.02911 (arXiv)