Published September 4, 2021 | Version 1.0a
Dataset Open

SINGA:PURA (SINGApore: Polyphonic URban Audio)

Description

SINGA:PURA Dataset (v1.0a)

This repository contains the strongly-labelled subset of recordings of the SINGA:PURA (SINGApore: Polyphonic URban Audio) dataset and corresponding metadata, formatted in a manner compatible with a soundata dataset loader.

Please note that this repository does not contain the unlabelled recordings of the SINGA:PURA dataset! If you wish to access the unlabelled recordings, please refer to https://doi.org/10.21979/N9/Y8UQ6F for the full version (v1.0) of the SINGA:PURA dataset (which contains both the strongly-labelled and unlabelled recordings).

Regarding this repository

The SINGA:PURA dataset is a polyphonic urban sound dataset with spatiotemporal context that contains 6547 strongly-labelled and 72406 unlabelled recordings from a wireless acoustic sensor network deployed in Singapore to identify and mitigate noise sources in Singapore. However, this repository only contains the subset of 6547 strongly-labelled recordings from the SINGA:PURA dataset and their corresponding labels, formatted in a manner compatible with a soundata dataset loader. The recordings are all 10 seconds in length, and may have 1 or 7 channels, depending on the recording device used to record them.

The readme file in this repository ("Readme.md") contains the same information as this description: a short description on the organisation of this repository, as well our label taxonomy and the dataset itself. For full details regarding the sensor units used, the recording conditions, and annotation methodology, please refer to our conference paper below:

K. Ooi, K. N. Watcharasupat, S. Peksi, F. A. Karnapi, Z.-T. Ong, D. Chua, H.-W. Leow, L.-L. Kwok, X.-L. Ng, Z.-A. Loh, W.-S. Gan, "A Strongly-Labelled Polyphonic Dataset of Urban Sounds with Spatiotemporal Context," in 13th Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2021.

The conference paper has also been included in this repository as "APSIPA.pdf".

Directory structure

This repository contains a total of 9 files. 5 of the files ("labelled.zip", "labelled.z01", "labelled.z02", "labelled.z03", "labelled.z04") form a multi-part ZIP archive that, when extracted, contain the subset of 6547 strongly-labelled recordings (in FLAC format) in the SINGA:PURA dataset organised in folders by date of recording. The other 4 files are:

  • "APSIPA.pdf": A PDF copy of the conference paper describing the dataset, recording and annotation methodology in detail.
  • "labelled_metadata_public.csv": A CSV file containing the metadata for the 6547 strongly-labelled recordings. Each row corresponds to a single recording. See the section titled "Metadata CSV file" for more information.
  • "labels_public.zip": A ZIP archive that, when extracted, contains 6547 CSV files that each contain the strong labels for their corresponding strongly-labelled recording. The names of the CSV files are identical to the names of the corresponding FLAC files containing the recordings, save for the file extension. Each row corresponds to a single acoustic event. See "Labels CSV files" for more information.
  • "Readme.md": The readme file for this repository.

Each numbered part of the multi-part ZIP archive is 1000 MB in size, which makes the dataset in its entirety about 5 GB in size. Please ensure that your connection has sufficient bandwidth to support the download, and it may also be useful to use a download manager for downloading the individual files of the dataset. To extract the multi-part ZIP archive, it may be helpful to use either WinRAR or WinZip.

After extraction, the directory structure of this repository should be as follows:

.
├─ labelled
│  ├─ 2020-08-03
│  │  └─ [b827eb7d576e][2020-08-03T23-32-11Z][manual][---][565a40f866f3d2804332ca7896a4c77d][93.29-86.29 66.65]!-90.flac
│  │
│  ├─ 2020-08-17
│  │  └─ <.flac files>
│  │
│  ├─ ...
│  │
│  └─ 2020-10-31
│     └─ <.flac files>
│
├─ labels_public
│  ├─ [b827eb0a63c9][2020-08-20T11-29-04Z][manual][---][de313d12d7f31937615be80cc47a1ad9][]-53.csv
│  ├─ [b827eb0a63c9][2020-08-20T11-30-04Z][manual][---][de313d12d7f31937615be80cc47a1ad9][]-54.csv
│  ├─ ...
│  └─ [b827ebf3744c][2020-09-02T06-53-04Z][manual][---][4edbade2d41d5f80e324ee4f10d401c0][]-1647.csv
│
├─ APSIPA.pdf
├─ labelled_metadata_public.csv
└─ Readme.md

Label taxonomy

Our label taxonomy is derived from the taxonomy used in the SONYC-UST datasets, but has been adapted to fit the local (Singapore) context while retaining compatibility with the SONYC-UST ontonology. We chose this taxonomy to allow the SINGA:PURA dataset to be used in conjunction with the SONYC-UST datasets when training urban sound tagging models by simply omitting the labels that are absent in the SONYC-UST taxonomy from the recordings in the SINGA:PURA dataset. For more information regarding the SONYC-UST datasets, please refer to the following paper published by the SONYC team:

M. Cartwright, J. Cramer, A. E. M. Mendez, Y. Wang, H. Wu, V. Lostanlen, M. Fuentes, G. Dove, C. Mydlarz, J. Salamon, O. Nov, J. P. Bello, "SONYC-UST-V2: An Urban Sound Tagging Dataset with Spatiotemporal Context," in Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2020.

Specifically, our label taxonomy consists of 14 coarse-grained classes and 40 fine-grained classes. Their organisation is as follows:

─┬─ 1. Engine ───────────────┬─ 1. Small engine
 │                           ├─ 2. Medium engine
 │                           └─ 3. Large engine
 ├─ 2. Machinery impact ─────┬─ 1. Rock drill
 │                           ├─ 2. Jackhammer
 │                           ├─ 3. Hoe ram
 │                           └─ 4. Pile driver
 ├─ 3. Non-machinery impact ─┬─ 1. Glass breaking*
 │                           ├─ 2. Car crash*
 │                           └─ 3. Explosion*
 ├─ 4. Powered saw ──────────┬─ 1. Chainsaw
 │                           ├─ 2. Small/medium rotating saw
 │                           └─ 3. Large rotating saw
 ├─ 5. Alert signal ─────────┬─ 1. Car horn
 │                           ├─ 2. Car alarm
 │                           ├─ 3. Siren
 │                           └─ 4. Reverse beeper
 ├─ 6. Music ────────────────┬─ 1. Stationary music
 │                           └─ 2. Mobile music
 ├─ 7. Human voice ──────────┬─ 1. Talking
 │                           ├─ 2. Shouting
 │                           ├─ 3. Large crowd
 │                           ├─ 4. Amplified speech
 │                           └─ 5. Singing*
 ├─ 8. Human movement* ──────┬─ 1. Footsteps*
 │                           └─ 2. Clapping*
 ├─ 9. Animal* ──────────────┬─ 1. Dog barking
 │                           ├─ 2. Bird chirping*
 │                           └─ 3. Insect chirping*
 ├─ 10. Water* ──────────────── 1. Hose pump*
 ├─ 11. Weather* ────────────┬─ 1. Rain*
 │                           ├─ 2. Thunder*
 │                           └─ 3. Wind*
 ├─ 12. Brake* ──────────────┬─ 1. Friction brake*
 │                           └─ 2. Exhaust brake*
 ├─ 13. Train* ──────────────── 1. Electric train*
 └─ 0. Others* ──────────────┬─ 1. Screeching*
                             ├─ 2. Plastic crinkling*
                             ├─ 3. Cleaning*
                             └─ 4. Gear*

Classes marked with an asterisk (*) are present in the SINGA:PURA taxonomy but not the SONYC taxonomy. The "Ice cream truck" class from the SONYC taxonomy has been excluded from the SINGA:PURA taxonomy because this class does not exist in the local context.

In addition, note that the label for the coarse-grained class "Others" in this repository is "0", which is different from the label "X" that is used in the full version of the SINGA:PURA dataset.

Metadata CSV file

Each row of "labelled_metadata_public.csv" corresponds to a single recording and contains the following fields:

  •  "sensor_id": A string representing the identity of the sensor that the recording was taken from. Each sensor node has a unique identity. In other words, if and only if the "sensor_id" strings for two files are different, then the recordings were taken from different sensors.
  •  "filename": The name of the raw audio file corresponding to this row of metadata. Note that there is actually a timestamp on the filename already --- this timestamp corresponds to the UTC+0 time zone and not the SGT (equivalent to UTC+8) time zone. However, the other metadata fields ("day", "hour", etc.) will correspond to the SGT time zone (specifically, the time zone in the "timezone" column), because the sensor nodes were physically located in that time zone.
  • "year": The year that the recording was made.
  • "month": The month of the year that the recording was made.
  • "date": The date of the month that the recording was made.
  • "day": The day of the week that the recording was made (0 = Sunday, 1 = Monday, ..., 6 = Saturday).
  • "hour": The hour of the day that the recording was made, in 24-hour format.
  • "minute": The minute of the hour that the recording was made.
  • "second": The second of the minute that the recording was made.
  • "timezone": The timezone corresponding to the temporal data in the fields of the CSV file. As of the current version, this value should be "SGT" (Singapore time zone, corresponding to UTC+8) for all recordings.
  • "town": The town in which the sensor is located in (either "East 1", "East 2", "West 1", or "West 2").

Labels CSV files

Each row of every CSV file in the "labels_public" folder corresponds to a single acoustic event and contains the following fields:

  • "annotator": A number in the set {1,2,3,4,5} denoting the annotator index. Each index corresponds to a unique annotator.
  • "filename": The name of the raw audio file (i.e. recording) that the annotator heard the event in. This is identical to the name of the CSV file, save for the file extension.
  • "event_label": The label of the event according to the taxonomy described in the "Label taxonomy" section, given in the format "<coarse label>-<fine label>". For example, if the event was a siren, then this would be "5-3". In addition, if (A) none of the fine-grained classes applied to the event, but a numbered coarse-grained class (i.e. all coarse-grained classes except "0. Others") did, OR (B) the annotator was not sure which fine-grained class the event belonged to, although they were sure of which coarse-grained class it belonged to, then it was assigned the fine-grained label "0". For example, a non-machinery impact that was not glass breaking, a car crash, or an explosion would be given the label "3-0". In addition, if no events corresponding to coarse- or fine-grained classes in the label taxonomy were heard for a given recording, then there would be a single row in that recording with the value "0-0" in this field.
  • "proximity": One of "near", "far", or "moving", corresponding to what the annotator believed was the proximity of the sound event to the sensor based on the recording. If "near" or "far", then the source is assumed to be stationary. Hence, any instances of class "6-2" are always labelled as "moving" for proximity and any instances of class "6-1" are always labelled as "near" or "far" for proximity. For events with event label "0-0", the value in this field is "NIL".
  •  "onset": The starting time of the event in the audio file, given in seconds and to a precision of 3 decimal places. If the event starts from the very beginning of the track, the onset is "0.000". For events with event label "0-0", the value in this field is "0.000".
  • "offset": The ending time of the event in the audio file, given in seconds and to a precision of 3 decimal places. If the event lasts to the very end of the track, the offset is "10.000". For events with event label "0-0", the value in this field is "10.000".
  • "remarks": Any remarks made by the annotator regarding the track of the particular label for the event. This can be left blank (i.e. as an empty string) if there are no remarks for that particular label.

Note that since there can be any number of sound events (including zero) in a given recording, it is possible that there may be multiple rows in a single CSV file. In addition, every recording has labels provided by at least one annotator, and some have labels provided by more than one annotator. Labels for the same recording provided by different annotators are found in the same CSV file. Lastly, some events in the taxonomy are rare enough that they do not occur in the strongly-labelled portion of the dataset, so not all possible event labels are represented in the CSV files in the "labels_public" folder.

License and attribution

This dataset is licensed under the Creative Commons Attribution-ShareAlike 4.0 International license (a human-readable summary is available at https://creativecommons.org/licenses/by-sa/4.0/ and the legal document for the license is available at https://creativecommons.org/licenses/by-sa/4.0/legalcode).

When attributing the dataset, please acknowledge Kenneth Ooi, Karn Watcharasupat, Santi Peksi, Furi Andi Karnapi, Zhen-Ting Ong, Danny Chua, Hui-Wen Leow, Li-Long Kwok, Xin-Lei Ng, Zhen-Ann Loh, and Woon-Seng Gan. Alternatively, if you are using the dataset in an academic publication, you may want to cite our conference paper instead:

K. Ooi, K. N. Watcharasupat, S. Peksi, F. A. Karnapi, Z.-T. Ong, D. Chua, H.-W. Leow, L.-L. Kwok, X.-L. Ng, Z.-A. Loh, W.-S. Gan, "A Strongly-Labelled Polyphonic Dataset of Urban Sounds with Spatiotemporal Context," in 13th Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2021.

Contact

Please feel free to drop an email to Kenneth Ooi at wooi002@e.ntu.edu.sg for questions and issues regarding the dataset.

Version history

v1.0a: Initial upload of dataset (6547 labelled)

Files

APSIPA.pdf

Files (4.5 GB)

Name Size Download all
md5:4e0a5437b87d8db9c343adce8ef25cc8
544.9 kB Preview Download
md5:98477daca861c6950cc8b620cecc286d
1.0 GB Download
md5:873b26cfe25bb3084e39e5af0dfebcad
1.0 GB Download
md5:71322a5c3ba33badfdd8e25c9ebf559a
1.0 GB Download
md5:a6d087babea1f797af99b81cb7c7ea4a
1.0 GB Download
md5:bdd46cc5e9187e97c37989b3b73e786e
302.2 MB Preview Download
md5:c5beb6374e55abfe7cd50f4f498c8376
903.8 kB Preview Download
md5:535242cf1094d95d086fc574874e9ddf
3.4 MB Preview Download
md5:f9296e5a796daaf5e9452fb9fd8ddae4
15.4 kB Preview Download

Additional details

Related works

Is supplement to
Dataset: 10.21979/N9/Y8UQ6F (DOI)
References
Dataset: https://zenodo.org/record/3966543#.YTJBYN8RXZQ (URL)