Lorient-1k
Description
Created By Félix Gontier and Mathieu Lagrange, LS2N, CNRS, Ecole Centrale Nantes
Contact : mathieu.lagrange@cnrs.fr
If used for research, please refer to:
@article{gontier2021training, title={Polyphonic training set synthesis improves self-supervised urban sound classification}, author={Félix Gontier and Vincent Lostanlen, and Mathieu Lagrange and Nicolas Fortin and Jean-Francois Petiot and Catherine Lavandier}, journal={The Journal of the Acoustical Society of America}, year={2021}, publisher={Acoustical Society of America} }
Lorient-1k contains 30 acoustic scenes of duration equal to 45 seconds.
These scenes were recorded with Zoom H4n handheld devices at 10 different locations of Lorient (France).
Four experts annotated the onset and offset times of three sources of interest: traffic, voice, and birds. Those annotations have been taken into account to produce a single annotations that is coherent with the notion of perceived time of presence. That is, the sum of activations per scene and per source is coherent with the perceived time of presence.
The total duration of the dataset is of the order of 1.35k seconds, i.e., 22.5 minutes.
The audio is provided as third-octave spectral data and mel spectrograms (as of YAMNET). The audio is made available as third octave spectral data, see demoTob.zip for an implementation of its computation from audio in Python.
From a python interpreter :
>> import numpy as np
>> s=np.load('Lorient-1k_spectralData.npy')
>> print(s.shape)
(30, 351, 29)
The three dimensions respectively corresponds to the sceneId, the frameId (time), and the spectralId (frequency).
>> a=np.load('Lorient-1k_presence.npy')
>> print(a.shape)
(30, 344, 3)
The third and fourth dimensions respectively corresponds to the sceneId, the frameId (time), the sourceId (traffic, voice, birds) and the annotatorId. Annotation is provided as a binary indicator of source presence for one second, that is 8 consecutive 125 ms frames with a hop of one frame.
>> a=np.load('Lorient-1k_time_of_presence.npy')
The time of presence is expressed in percents, per scene, and per source.
>> print(a.shape)
(30, 3)
The audio files are also available in the form of 16bits 44.1kHz wav files. Audio files are named in the same order as the first dimension of the .npy files : 00x.wav third-octaves and time of presence evaluation are accessed using s[x-1, :, : ] and a[x-1, :, : ]
Files
audio.zip
Files
(57.5 MB)
Name | Size | Download all |
---|---|---|
md5:38e35c67be3924dbd70480dc263269c1
|
55.0 MB | Preview Download |
md5:3fee608d254ba5f539ad450fa5ae3f98
|
2.4 MB | Download |
md5:1a15f51bb65505b2501475a20c652913
|
848 Bytes | Download |