Published July 17, 2019 | Version 0.1.1
Dataset Open

MAVD-traffic dataset

  • 1. Universidad de la República, Facultad de Ingeniería

Description

This is a dataset for sound event detection in urban environments, which is the first of a series of datasets planned within an ongoing research project for urban noise monitoring in Montevideo city, Uruguay.  The dataset is called MAVD for Montevideo Audio and Video Dataset. This release focuses on traffic noise, hence the name MAVD-traffic, as it is usually the predominant noise source in urban environments. Apart from audio recordings it also includes synchronized video files. The sound event annotations follow an ontology for traffic sounds that is the combination of a set of two taxonomies: vehicle types (e.g. car, bus) and vehicle components (e.g.engine, brakes), and a set of actions related to them (e.g. idling, accelerating). Thus, the proposed ontology allows for a flexible and detailed description of traffic sounds. Since the taxonomies follow a hierarchy it can be used with different levels of detail.

The dataset was presented in: Pablo Zinemanas, Pablo Cancela and Martín Rocamora. "MAVD: a dataset for sound event detection in urban environments." DCASE 2019 Workshop, 25-26 October 2019, New York, USA

When MAVD-traffic is used for academic research, we would highly appreciate it if scientific publications cite the previous paper.

 

Notes

NOTE: the 0.1.1 version adds a metadata.txt file with the location (latitude, longitude) and a time stamp (year, month, day, hour, minute) of each recording. The recordings were produced in Montevideo, the capital city of Uruguay, which has a population of 1.4 million people. Four different locations were included in this release of the dataset, corresponding to different levels of traffic activity and social use characteristics. The sound was captured with a SONY PCM-D50 recorder at a sampling rate of 48kHz and a resolution of 24bits. The video was recorded with a GoPro Hero 3 camera at a rate of 30 frames per second and a resolution of 1920×1080 pixels. Audio and video files of about 15–minutes long were recorded at different times of the day in different locations. The ELAN software was used to manually annotate the recordings of the dataset. The annotations follow an ontology for traffic sounds that arise from the combination of a set of two taxonomies: vehicle types (e.g. car, bus) and vehicle components (e.g. engine, brakes), and a set of actions related to them (e.g. idling, accelerating).

Files

annotations_test.zip

Files (1.3 GB)

Name Size Download all
md5:94e609cad8702f5fb6afe52815547b67
12.8 kB Preview Download
md5:ea67a330ab695873a12a07ed494eae42
19.8 kB Preview Download
md5:30edae5c398dffe52a2ee1373e89fd96
6.2 kB Preview Download
md5:824b035fa76dc83919e312105db4214f
402.2 MB Preview Download
md5:c7a897623364f80c8ff3775d8dd601a2
525.5 MB Preview Download
md5:d220cef03f99b2d108d708421240f86d
152.3 MB Preview Download
md5:593cad71171d713775e102bb59d909da
2.9 kB Preview Download
md5:f71d547c558d007593e8ec5b70aa696c
2.5 kB Download
md5:85413e8505f1f26672eb43802ba1c465
72.3 MB Preview Download
md5:df0a1eef1dd89be72ae805e815c0380c
136.9 MB Preview Download
md5:3fcf748f10432dff1a75442ee82ccf7c
38.1 MB Preview Download

Additional details

References

  • Pablo Zinemanas, Pablo Cancela and Martín Rocamora. MAVD: a dataset for sound event detection in urban environments. DCASE 2019 Workshop, 25-26 October 2019, New York, USA