Dataset Open Access

SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network

Mark Cartwright; Jason Cramer; Ana Elisa Mendez Mendez; Yu Wang; Ho-Hsiang Wu; Vincent Lostanlen; Magdalena Fuentes; Graham Dove; Charlie Mydlarz; Justin Salamon; Oded Nov; Juan Pablo Bello

DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="" xmlns="" xsi:schemaLocation="">
  <identifier identifierType="DOI">10.5281/zenodo.3966543</identifier>
      <creatorName>Mark Cartwright</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="">0000-0002-5908-390X</nameIdentifier>
      <affiliation>New York University</affiliation>
      <creatorName>Jason Cramer</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="">0000-0001-5288-9399</nameIdentifier>
      <affiliation>New York University</affiliation>
      <creatorName>Ana Elisa Mendez Mendez</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="">0000-0002-4861-5616</nameIdentifier>
      <affiliation>New York University</affiliation>
      <creatorName>Yu Wang</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="">0000-0002-1615-5141</nameIdentifier>
      <affiliation>New York University</affiliation>
      <creatorName>Ho-Hsiang Wu</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="">0000-0002-1102-074X</nameIdentifier>
      <affiliation>New York University</affiliation>
      <creatorName>Vincent Lostanlen</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="">0000-0003-0580-1651</nameIdentifier>
      <affiliation>Cornel University</affiliation>
      <creatorName>Magdalena Fuentes</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="">0000-0003-4506-6639</nameIdentifier>
      <affiliation>New York University</affiliation>
      <creatorName>Graham Dove</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="">0000-0002-3551-0209</nameIdentifier>
      <affiliation>New York University</affiliation>
      <creatorName>Charlie Mydlarz</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="">0000-0001-7061-0638</nameIdentifier>
      <affiliation>New York University</affiliation>
      <creatorName>Justin Salamon</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="">0000-0001-6345-4593</nameIdentifier>
      <affiliation>New York University</affiliation>
      <creatorName>Oded Nov</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="">0000-0001-6410-2995</nameIdentifier>
      <affiliation>New York University</affiliation>
      <creatorName>Juan Pablo Bello</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="">0000-0001-8561-5204</nameIdentifier>
      <affiliation>New York University</affiliation>
    <title>SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network</title>
    <subject>urban sound</subject>
    <subject>noise pollution</subject>
    <subject>machine listening</subject>
    <subject>computer audition</subject>
    <subject>longterm spatiotemporal context</subject>
    <subject>sound tagging</subject>
    <date dateType="Issued">2020-09-14</date>
  <resourceType resourceTypeGeneral="Dataset"/>
    <alternateIdentifier alternateIdentifierType="url"></alternateIdentifier>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.2590741</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf"></relatedIdentifier>
    <rights rightsURI="">Creative Commons Attribution 4.0 International</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
    <description descriptionType="Abstract">&lt;p&gt;&lt;strong&gt;SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Version 2.3, September 2020&lt;/p&gt;


&lt;p&gt;&lt;strong&gt;Created by&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Mark Cartwright (1,2,3), Jason Cramer (1), Ana Elisa Mendez Mendez (1), Yu Wang (1), Ho-Hsiang Wu (1), Vincent Lostanlen (1,2,4), Magdalena Fuentes (1), Graham Dove (2), Charlie Mydlarz (1,2), Justin Salamon (5), Oded Nov (6), Juan Pablo Bello (1,2,3)&lt;/p&gt;

	&lt;li&gt;Music and Audio Research Lab, New York University&lt;/li&gt;
	&lt;li&gt;Center for Urban Science and Progress, New York University&lt;/li&gt;
	&lt;li&gt;Department of Computer Science and Engineering, New York University&lt;/li&gt;
	&lt;li&gt;Cornell Lab of Ornithology&lt;/li&gt;
	&lt;li&gt;Adobe Research&lt;/li&gt;
	&lt;li&gt;Department of Technology Management and Innovation, New York University&lt;/li&gt;



&lt;p&gt;If using this data in an academic work, please reference the DOI and version, as well as cite the following paper, which presented the data collection procedure and the first version of the dataset:&lt;/p&gt;

&lt;p&gt;Cartwright, M., Cramer, J., Mendez, A.E.M., Wang, Y., Wu, H., Lostanlen, V., Fuentes, M., Dove, G., Mydlarz, C., Salamon, J., Nov, O., Bello, J.P. SONYC-UST-V2: An Urban Sound Tagging Dataset with Spatiotemporal Context. In &lt;em&gt;Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE)&lt;/em&gt;, 2020.&lt;br&gt;
&lt;a href=""&gt;[pdf]&lt;/a&gt;&lt;/p&gt;



&lt;p&gt;SONYC Urban Sound Tagging (SONYC-UST) is a dataset for the development and evaluation of machine listening systems for realistic urban noise monitoring. The audio was recorded from the &lt;a href=""&gt;SONYC&lt;/a&gt;&amp;nbsp;acoustic sensor network. Volunteers on the &amp;nbsp;&lt;a href=""&gt;Zooniverse&lt;/a&gt;&amp;nbsp;citizen science platform tagged the presence of 23 classes that were chosen in consultation with the New York City Department of Environmental Protection. These 23 fine-grained classes can be grouped into 8 coarse-grained classes. The recordings are split into three sets: training, validation, and test. The training and validation sets are disjoint with respect to the sensor from which each recording came, and the test set is displaced in time. For increased reliability, three volunteers annotated each recording. In addition, members of the SONYC team subsequently created a subset of verified, ground-truth tags using a two-stage annotation procedure in which two annotators independently tagged and then collectively resolved any disagreements. This subset of recordings with verified annotations intersects with all three recording splits. All of the recordings in the test set have these verified annotations.&amp;nbsp; In v2 version of this dataset, we have also included coarse spatiotemporal context information to aid in tag prediction when time and location is known. For more details on the motivation and creation of this dataset see the &lt;a href=""&gt;DCASE 2020 Urban Sound Tagging with Spatiotemporal Context Task website&lt;/a&gt;.&lt;/p&gt;


&lt;p&gt;&lt;strong&gt;Audio data&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The provided audio has been acquired using the SONYC acoustic sensor network for urban noise pollution monitoring. Over 60 different sensors have been deployed in New York City, and these sensors have collectively gathered the equivalent of over 50 years of audio data, of which we provide a small subset. The data was sampled by selecting the nearest neighbors on VGGish features of recordings known to have classes of interest. All recordings are 10 seconds and were recorded with identical microphones at identical gain settings. To maintain privacy, we quantized the spatial information to the level of a city block, and we quantized the temporal information to the level of an hour. We also limited the occurrence of recordings with positive human voice annotations to one per hour per sensor.&lt;/p&gt;


&lt;p&gt;&lt;strong&gt;Label taxonomy&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The label taxonomy is as follows:&lt;/p&gt;

	1: small-sounding-engine&lt;br&gt;
	2: medium-sounding-engine&lt;br&gt;
	3: large-sounding-engine&lt;br&gt;
	X: engine-of-uncertain-size&lt;/li&gt;
	1: rock-drill&lt;br&gt;
	2: jackhammer&lt;br&gt;
	3: hoe-ram&lt;br&gt;
	4: pile-driver&lt;br&gt;
	X: other-unknown-impact-machinery&lt;/li&gt;
	1: non-machinery-impact&lt;/li&gt;
	1: chainsaw&lt;br&gt;
	2: small-medium-rotating-saw&lt;br&gt;
	3: large-rotating-saw&lt;br&gt;
	X: other-unknown-powered-saw&lt;/li&gt;
	1: car-horn&lt;br&gt;
	2: car-alarm&lt;br&gt;
	3: siren&lt;br&gt;
	4: reverse-beeper&lt;br&gt;
	X: other-unknown-alert-signal&lt;/li&gt;
	1: stationary-music&lt;br&gt;
	2: mobile-music&lt;br&gt;
	3: ice-cream-truck&lt;br&gt;
	X: music-from-uncertain-source&lt;/li&gt;
	1: person-or-small-group-talking&lt;br&gt;
	2: person-or-small-group-shouting&lt;br&gt;
	3: large-crowd&lt;br&gt;
	4: amplified-speech&lt;br&gt;
	X: other-unknown-human-voice&lt;/li&gt;
	1: dog-barking-whining&lt;/li&gt;

&lt;p&gt;The classes preceded by an &lt;code&gt;X&lt;/code&gt; code indicate when an annotator was able to identify the coarse class, but couldn&amp;rsquo;t identify the fine class because either they were uncertain which fine class it was or the fine class was not included in the taxonomy. &lt;code&gt;dcase-ust-taxonomy.yaml&lt;/code&gt; contains this taxonomy in an easily machine-readable form.&lt;/p&gt;


&lt;p&gt;&lt;strong&gt;Data splits&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This release contains a training subset (13538 recordings from 35 sensors), and validation subset (4308 recordings from 9 sensors), and a test subset (669 recordings from 48 sensors). The training and validation subsets are disjoint with respect to the sensor from which each recording came. The sensors in the test set will not disjoint from the training and validation subsets, but the test recordings are displaced in time, occurring after any of the recordings in the training and validation subset. The subset of recordings with verified annotations (1380 recordings) intersects with all three recording splits.&amp;nbsp; All of the recordings in the test set have these verified annotations.&lt;/p&gt;


&lt;p&gt;&lt;strong&gt;Annotation data&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The annotation data are&amp;nbsp;contained in &lt;code&gt;annotations.csv&lt;/code&gt;, and&amp;nbsp;encompass the training, validation, and test subsets. Each row in the file represents one multi-label annotation of a recording&amp;mdash;it could be the annotation of a single citizen science volunteer, a single SONYC team member, or the agreed-upon ground truth by the SONYC team (see the &lt;em&gt;annotator_id&lt;/em&gt; column description for more information).&amp;nbsp; Note that since the SONYC team members annotated each class group separately, there may be multiple annotation rows by a single SONYC team annotator for a particular audio recording.&lt;/p&gt;





&lt;p&gt;The data split. (&lt;em&gt;train&lt;/em&gt;, &lt;em&gt;validate, test&lt;/em&gt;)&lt;/p&gt;


&lt;p&gt;The ID of the sensor the recording is from.&lt;/p&gt;


&lt;p&gt;The filename of the audio recording&lt;/p&gt;


&lt;p&gt;The anonymous ID of the annotator. If this value is positive, it is a citizen science volunteer from the Zooniverse platform. If it is negative, it is a SONYC team member. If it is &lt;code&gt;0&lt;/code&gt;, then it is the ground truth agreed-upon by the SONYC team.&lt;/p&gt;


&lt;p&gt;The year the recording is from.&lt;/p&gt;


&lt;p&gt;The week of the year the recording is from.&lt;/p&gt;


&lt;p&gt;The day of the week the recording is from, with Monday as the start (i.e. &lt;code&gt;0&lt;/code&gt;=Monday).&lt;/p&gt;


&lt;p&gt;The hour of the day the recording is from&lt;/p&gt;

The NYC borough in which the sensor is located (&lt;code&gt;1&lt;/code&gt;=Manhattan, &lt;code&gt;3&lt;/code&gt;=Brooklyn, &lt;code&gt;4&lt;/code&gt;=Queens). This corresponds to the first digit in the 10-digit NYC parcel number system known as Borough, Block, Lot (BBL).&lt;/p&gt;


&lt;p&gt;The NYC block in which the sensor is located. This corresponds to digits 2&amp;mdash;6 digit in the 10-digit NYC parcel number system known as Borough, Block, Lot (BBL).&lt;/p&gt;


&lt;p&gt;The latitude coordinate of the &lt;strong&gt;block&lt;/strong&gt;&amp;nbsp;in which the sensor is located.&lt;/p&gt;


&lt;p&gt;The longitude coordinate of the &lt;strong&gt;block&lt;/strong&gt;&amp;nbsp;in which the sensor is located.&lt;/p&gt;


&lt;p&gt;Columns of this form indicate the presence of fine-level class. &lt;code&gt;1&lt;/code&gt; if present, &lt;code&gt;0&lt;/code&gt; if not present. If &lt;code&gt;-1&lt;/code&gt;, then the class was not labeled in this annotation because the annotation was performed by a SONYC team member who only annotated one coarse group of classes at a time when annotating the verified subset.&lt;/p&gt;


&lt;p&gt;Columns of this form indicate the presence of a coarse-level class. &lt;code&gt;1&lt;/code&gt; if present, &lt;code&gt;0&lt;/code&gt; if not present. If &lt;code&gt;-1&lt;/code&gt;, then the class was not labeled in this annotation because the annotation was performed by a SONYC team member who only annotated one coarse group of classes at a time when annotating the verified subset. These columns are computed from the fine-level class presence columns and are presented here for convenience when training on only coarse-level classes.&lt;/p&gt;


&lt;p&gt;Columns of this form indicate the proximity of a fine-level class. After indicating the presence of a fine-level class, citizen science annotators were asked to indicate the proximity of the sound event to the sensor. Only the citizen science volunteers performed this task, and therefore this data is not included in the verified annotations. This column may take on one of the following four values: (&lt;code&gt;near&lt;/code&gt;, &lt;code&gt;far&lt;/code&gt;, &lt;code&gt;notsure&lt;/code&gt;, &lt;code&gt;-1&lt;/code&gt;). If &lt;code&gt;-1&lt;/code&gt;, then the proximity was not annotated because either the annotation was not performed by a citizen science volunteer, or the citizen science volunteer did not indicate the presence of the class.&lt;/p&gt;


&lt;p&gt;&lt;strong&gt;Conditions of use&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Dataset created by Mark Cartwright, Jason Cramer, Ana Elisa Mendez Mendez, Yu Wang, Ho-Hsiang Wu, Vincent Lostanlen, Magdalena Fuentes, Graham Dove, Charlie Mydlarz, Justin Salamon, Oded Nov, and Juan Pablo Bello&lt;/p&gt;

&lt;p&gt;The SONYC-UST dataset is offered free of charge under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license:&lt;br&gt;
&lt;a href=""&gt;;/a&gt;&lt;/p&gt;

&lt;p&gt;The dataset and its contents are made available on an &amp;ldquo;as is&amp;rdquo; basis and without warranties of any kind, including without limitation satisfactory quality and conformity, merchantability, fitness for a particular purpose, accuracy or completeness, or absence of errors. Subject to any liability that may not be excluded or limited by law, New York University is not liable for, and expressly excludes all liability for, loss or damage however and whenever caused to anyone by any use of the SONYC-UST dataset or any part of it.&lt;/p&gt;



&lt;p&gt;Please help us improve SONYC-UST&amp;nbsp;by sending your feedback to:&lt;/p&gt;

	&lt;li&gt;Mark Cartwright: &lt;a href=""&gt;;/a&gt;&lt;/li&gt;

&lt;p&gt;In case of a problem, please include as many details as possible.&lt;/p&gt;



&lt;p&gt;We would like to thank all the Zooniverse volunteers who continue to contribute to our project. This work is supported by &lt;a href=""&gt;National Science Foundation award 1544753&lt;/a&gt;.&lt;/p&gt;


&lt;p&gt;&lt;strong&gt;Change log&lt;/strong&gt;&lt;/p&gt;

	&lt;li&gt;2.3 Added the ground truth annotations for the test set, and regrouped the audio files for upload to Zenodo.&lt;/li&gt;
	&lt;li&gt;2.2&amp;nbsp;Added the audio for the test set (audio-eval.tar.gz).&lt;/li&gt;
	&lt;li&gt;2.1 The DCASE 2020 development dataset. 14778 new recordings added along with coarse spatiotemporal context information.&lt;/li&gt;
	&lt;li&gt;1.0 Data is the same as v0.4. Publication added to README.&lt;/li&gt;
	&lt;li&gt;0.4 Fixed error in annotations. Previously, the coarse class &amp;quot;machinery-impact&amp;quot; was accidentally indicated as present whenever &amp;quot;non-machinery-impact&amp;quot; was present regardless of the presence of &amp;quot;machinery-impact&amp;quot;. This error has been fixed.&lt;/li&gt;
	&lt;li&gt;0.3 Test set annotations added&lt;/li&gt;
	&lt;li&gt;0.2 Test set audio files added&lt;/li&gt;
    <description descriptionType="Other">This work is supported by National Science Foundation award 1544753.</description>
All versions This version
Views 6,996661
Downloads 15,1901,184
Data volume 51.9 TB594.8 GB
Unique views 5,402563
Unique downloads 4,303244


Cite as