Lorient-1k

Mathieu Lagrange; Félix Gontier

doi:10.5281/zenodo.10393315

Published December 15, 2023 | Version v4

Dataset Open

Lorient-1k

1. LS2N

Created By Félix Gontier and Mathieu Lagrange, LS2N, CNRS, Ecole Centrale Nantes

Contact : mathieu.lagrange@cnrs.fr

If used for research, please refer to:

@article{gontier2021training,
  title={Polyphonic training set synthesis improves self-supervised urban sound classification},
  author={Félix Gontier and Vincent Lostanlen, and Mathieu Lagrange and Nicolas Fortin and Jean-Francois Petiot and Catherine Lavandier},
  journal={The Journal of the Acoustical Society of America},
  year={2021},
  publisher={Acoustical Society of America}
}

Lorient-1k contains 30 acoustic scenes of duration equal to 45 seconds.
These scenes were recorded with Zoom H4n handheld devices at 10 different locations of Lorient (France).
Four experts annotated the onset and offset times of three sources of interest: traffic, voice, and birds. Those annotations have been taken into account to produce a single annotations that is coherent with the notion of perceived time of presence. That is, the sum of activations per scene and per source is coherent with the perceived time of presence.

The total duration of the dataset is of the order of 1.35k seconds, i.e., 22.5 minutes.

The audio is provided as third-octave spectral data and mel spectrograms (as of YAMNET). The audio is made available as third octave spectral data, see demoTob.zip for an implementation of its computation from audio in Python.

From a python interpreter :

>> import numpy as np

>> s=np.load('Lorient-1k_spectralData.npy')

>> print(s.shape)

(30, 351, 29)

The three dimensions respectively corresponds to the sceneId, the frameId (time), and the spectralId (frequency).

>> a=np.load('Lorient-1k_presence.npy')

>> print(a.shape)

(30, 344, 3)

The third and fourth dimensions respectively corresponds to the sceneId, the frameId (time), the sourceId (traffic, voice, birds) and the annotatorId. Annotation is provided as a binary indicator of source presence for one second, that is 8 consecutive 125 ms frames with a hop of one frame.

>> a=np.load('Lorient-1k_time_of_presence.npy')

The time of presence is expressed in percents, per scene, and per source.

>> print(a.shape)

(30, 3)

The audio files are also available in the form of 16bits 44.1kHz wav files. Audio files are named in the same order as the first dimension of the .npy files : 00x.wav third-octaves and time of presence evaluation are accessed using s[x-1, :, : ] and a[x-1, :, : ]

Files

audio.zip

Files (57.5 MB)

Name	Size	Download all
audio.zip md5:38e35c67be3924dbd70480dc263269c1	55.0 MB	Preview Download
Lorient-1k_spectralData.npy md5:3fee608d254ba5f539ad450fa5ae3f98	2.4 MB	Download
Lorient-1k_time_of_presence.npy md5:1a15f51bb65505b2501475a20c652913	848 Bytes	Download

Citations

Oops! Something went wrong while fetching results.

	All versions	This version
Views	697	119
Downloads	94	16
Data volume	2.1 GB	234.8 MB

Lorient-1k

Creators

Description

Files

audio.zip

Files (57.5 MB)