Zenodo.org will be unavailable for 2 hours on September 29th from 06:00-08:00 UTC. See announcement.

Dataset Open Access

Dataset used in COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations

Xavier Favory; Konstantinos Drossos; Tuomas Virtanen; Xavier Serra

This dataset consists of two hdf5 files that contain pre-computed log-mel spectrograms that have been used to to train audio embedding models. The dataset is split into a training set and a validation set containing respectively 170793 and 19103 spectrogram patches with their accompanying multi-hot encoded tags from a vocabulary of 1000 tags provided by Freesound users.

More details can be found in "COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations" by X. Favory, K. Drossos, T. Virtanen, and X. Serra. The code is available at this GitHub repository.

 

License:

This dataset is derived from content from the Freesound collection. All sounds are released under Creative Commons (CC) licenses from either CC0, CC-BY, CC-S+, or CC-BY-NC. We attribute authors of all the sounds used in the dataset and provide their corresponding licenses in the attributions.txt file.

 

Files (7.4 GB)
Name Size
attributions.txt
md5:f8caf5d6797fab41d5309fb982c0a9e9
23.1 MB Download
spec_tags_top_1000
md5:b75e6a7e0fb96cad034c523c4cf9f804
6.6 GB Download
spec_tags_top_1000_val
md5:af3b62f3105b516e93ac7ad4c4227028
742.5 MB Download
649
2,298
views
downloads
All versions This version
Views 649649
Downloads 2,2982,298
Data volume 9.8 TB9.8 TB
Unique views 598598
Unique downloads 878878

Share

Cite as