Published December 15, 2023 | Version v4
Dataset Open

Lorient-1k

  • 1. LS2N

Description

Created By Félix Gontier and Mathieu Lagrange, LS2N, CNRS, Ecole Centrale Nantes

Contact : mathieu.lagrange@cnrs.fr

If used for research, please refer to:

@article{gontier2021training,
  title={Polyphonic training set synthesis improves self-supervised urban sound classification},
  author={Félix Gontier and Vincent Lostanlen, and Mathieu Lagrange and Nicolas Fortin and Jean-Francois Petiot and Catherine Lavandier},
  journal={The Journal of the Acoustical Society of America},
  year={2021},
  publisher={Acoustical Society of America}
}

Lorient-1k contains 30 acoustic scenes of duration equal to 45 seconds.
These scenes were recorded with Zoom H4n handheld devices at 10 different locations of Lorient (France).
Four experts annotated the onset and offset times of three sources of interest: traffic, voice, and birds. Those annotations have been taken into account to produce a single annotations that is coherent with the notion of perceived time of presence. That is, the sum of activations per scene and per source is coherent with the perceived time of presence.


The total duration of the dataset is of the order of 1.35k seconds, i.e., 22.5 minutes.

The audio is provided as third-octave spectral data and mel spectrograms (as of YAMNET). The audio is made available as third octave spectral data, see demoTob.zip for an implementation of its computation from audio in Python.

 

From a python interpreter :

>> import numpy as np

>> s=np.load('Lorient-1k_spectralData.npy')

>> print(s.shape)

(30, 351, 29)

The three dimensions respectively corresponds to the sceneId, the frameId (time), and the spectralId (frequency).

>> a=np.load('Lorient-1k_presence.npy')

>> print(a.shape)

(30, 344, 3)

The third and fourth dimensions respectively corresponds to the sceneId, the frameId (time), the sourceId (traffic, voice, birds) and the annotatorId. Annotation is provided as a binary indicator of source presence for one second, that is 8 consecutive 125 ms frames with a hop of one frame.

>> a=np.load('Lorient-1k_time_of_presence.npy')

The time of presence is expressed in percents, per scene, and per source.

>> print(a.shape)

(30, 3)

The audio files are also available in the form of 16bits 44.1kHz wav files. Audio files are named in the same order as the first dimension of the .npy files : 00x.wav third-octaves and time of presence evaluation are accessed using s[x-1, :, : ] and a[x-1, :, : ]

Files

audio.zip

Files (57.5 MB)

Name Size Download all
md5:38e35c67be3924dbd70480dc263269c1
55.0 MB Preview Download
md5:3fee608d254ba5f539ad450fa5ae3f98
2.4 MB Download
md5:1a15f51bb65505b2501475a20c652913
848 Bytes Download