Algorithms and representations for supporting online music creation with large-scale audio databases

doi:10.5281/zenodo.3674147

Published June 5, 2015 | Version v1

Thesis Open

Algorithms and representations for supporting online music creation with large-scale audio databases

Roma, Gerard¹

1. Universitat Pompeu Fabra

Supervisor:

Serra, Xavier¹

1. Universitat Pompeu Fabra

The rapid adoption of Internet and web technologies has created an opportunity for making music collaboratively by sharing information online. However, current applications for online music making do not take advantage of the potential of shared information. The goal of this dissertation is to provide and evaluate algorithms and representations for interacting with large audio databases that facilitate music creation by online communities. This work has been developed in the context of Freesound, a large-scale, community-driven database of audio recordings shared under Creative Commons (CC) licenses. The diversity of sounds available through this kind of platform is unprecedented. At the same time, the unstructured nature of community-driven processes poses new challenges for indexing and retrieving information to support musical creativity. In this dissertation we propose and evaluate algorithms and representations for dealing with the main elements required by online music making applications based on large-scale audio databases: sound files, including time-varying and aggregate representations, taxonomies for retrieving sounds, music representations and community models. As a generic low-level representation for audio signals, we analyze the framework of cepstral coefficients, evaluating their performance with example classification tasks. We found that switching to more recent auditory filter such as gammatone filters improves, at large scales, on traditional representations based on the mel scale. We then consider common types of sounds for obtaining aggregated representations. We show that several time series analysis features computed from the cepstral coefficients complement traditional statistics for improved performance. For interacting with large databases of sounds, we propose a novel unsupervised algorithm that automatically generates taxonomical organizations based on the low-level signal representations. Based on user studies, we show that our approach can be used in place of traditional supervised classification approaches for providing a lexicon of acoustic categories suitable for creative applications. Next, a computational representation is described for music based on audio samples. We demonstrate through a user experiment that it facilitates collaborative creation and supports computational analysis using the lexicons generated by sound taxonomies. Finally, we deal with representation and analysis of user communities. We propose a method for measuring collective creativity in audio sharing. By analyzing the activity of the Freesound community over a period of more than 5 years, we show that the proposed creativity measures can be significantly related to social structure characterized by network analysis.

Files

groma_thesis.pdf

Files (5.6 MB)

Name	Size	Download all
groma_thesis.pdf md5:737887c1b3bbc0bd7ef7c53360f2a9ab	5.6 MB	Preview Download

Additional details

COMPMUSIC – Computational models for the discovery of the world's music 267583: European Commission

	All versions	This version
Views	49	49
Downloads	77	77
Data volume	456.6 MB	456.6 MB

Algorithms and representations for supporting online music creation with large-scale audio databases

Creators

Contributors

Supervisor:

Description

Files

groma_thesis.pdf

Files (5.6 MB)

Additional details

Funding