Dataset used in COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations

Xavier Favory; Konstantinos Drossos; Tuomas Virtanen; Xavier Serra

doi:10.5281/zenodo.3887261

Published June 9, 2020 | Version v1

Dataset Open

Dataset used in COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations

1. Music Technology Group, Universitat Pompeu Fabra
2. Audio Research Group, Faculty of Information Technology and Communication Sciences, Tampere University

This dataset consists of two hdf5 files that contain pre-computed log-mel spectrograms that have been used to to train audio embedding models. The dataset is split into a training set and a validation set containing respectively 170793 and 19103 spectrogram patches with their accompanying multi-hot encoded tags from a vocabulary of 1000 tags provided by Freesound users.

More details can be found in "COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations" by X. Favory, K. Drossos, T. Virtanen, and X. Serra. The code is available at this GitHub repository.

License:

This dataset is derived from content from the Freesound collection. All sounds are released under Creative Commons (CC) licenses from either CC0, CC-BY, CC-S+, or CC-BY-NC. We attribute authors of all the sounds used in the dataset and provide their corresponding licenses in the attributions.txt file.

Files

attributions.txt

Files (7.4 GB)

Name	Size
attributions.txt md5:f8caf5d6797fab41d5309fb982c0a9e9	23.1 MB	Preview Download
spec_tags_top_1000 md5:b75e6a7e0fb96cad034c523c4cf9f804	6.6 GB	Download
spec_tags_top_1000_val md5:af3b62f3105b516e93ac7ad4c4227028	742.5 MB	Download

	All versions	This version
Views	1,457	1,454
Downloads	1,662	1,658
Data volume	11.6 TB	11.6 TB

Dataset used in COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations

Authors/Creators

Description

Files

attributions.txt

Files (7.4 GB)