Published June 7, 2022 | Version v2
Conference paper Open

Audio Metaphor 2.0: An Improved Classification and Segmentation Pipeline for Generative Sound Design Systems

  • 1. University of British Columbia
  • 2. Simon Fraser University

Description

Soundscape composition and design is the creative practice of processing and combining sound recordings to evoke auditory associations and memories within a listener. We present a new set of classification and segmentation algorithms as part of Audio Metaphor (AUME), a generative system for creating novel soundscape compositions. Audio Metaphor processes natural language queries from a user to retrieve semantically linked sound recordings from a database containing 395,541 audio files. Building off previous work, we implemented a new audio feature extractor and conducted experiments to test the accuracy of the updated system. We then classified audio files based on general soundscape composition categories, improved emotion prediction, and refined our segmentation algorithm. The model maintains a good accuracy in segment classification, and we significantly improved valence and arousal prediction models - as noted by the r-squared (72.2% and 92.0%) and mean squared error values (0.09 and 0.03) in valence and arousal respectively. An empirical analysis, among other improvements, finds that the new system provides better segmentation results.

Files

51.pdf

Files (730.4 kB)

Name Size Download all
md5:52bd48e5e56dec80ae700a36e6bda44c
730.4 kB Preview Download