Dataset Open Access
This dataset consists of two hdf5 files that contain pre-computed log-mel spectrograms that have been used to to train audio embedding models. The dataset is split into a training set and a validation set containing respectively 170793 and 19103 spectrogram patches with their accompanying multi-hot encoded tags from a vocabulary of 1000 tags provided by Freesound users.
More details can be found in "COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations" by X. Favory, K. Drossos, T. Virtanen, and X. Serra. The code is available at this GitHub repository.
This dataset is derived from content from the Freesound collection. All sounds are released under Creative Commons (CC) licenses from either CC0, CC-BY, CC-S+, or CC-BY-NC. We attribute authors of all the sounds used in the dataset and provide their corresponding licenses in the attributions.txt file.