Synthesis by Layering: Learning a Variational Space of Drum Sounds
In this work, we demonstrate a variational autoencoder designed to reconstruct drum samples using linear combinations from a small predefined library of existing samples. Inspired by the music production practice of layering two or more samples on top of each other to create rich and unique textures, we synthesize drum sounds by producing sparse sets of mixing coefficients to apply to the predefined library, which are then layered to create new audio samples. By training this model to approximate a range of professionally produced and recorded drum samples, we aim to learn a distribution over possible layering strategies given a fixed sample library, which we can subsequently sample from or otherwise manipulate. We find that varying a particular dimension of the latent vectors in the space learned by the model does not simply linearly scale the mixing weights; rather, it smoothly varies the perceptual nature of the sample by swapping different samples in and out of the sparse mixture. We present a user-interface prototype to engage intuitively with our system, discuss the performance of our modeling approach, and highlight potential applications in a studio production environment.