Published November 4, 2019 | Version v1
Conference paper Open

Fast and Flexible Neural Audio Synthesis

Description

Autoregressive neural networks, such as WaveNet, have opened up new avenues for expressive audio synthesis. High-quality speech synthesis utilizes detailed linguistic features for conditioning, but comparable levels of control have yet to be realized for neural synthesis of musical instruments. Here, we demonstrate an autoregressive model capable of synthesizing realistic audio that closely follows fine-scale temporal conditioning for loudness and fundamental frequency. We find the appropriate choice of conditioning features and architectures improves both the quantitative accuracy of audio resynthesis and qualitative responsiveness to creative manipulation of conditioning. While large autoregressive models generate audio much slower than real-time, we achieve these results with a more efficient WaveRNN model, opening the door for exploring real-time interactive audio synthesis with neural networks.

Files

ismir2019_paper_000063.pdf

Files (4.1 MB)

Name Size Download all
md5:2687152f1c01c4f7dbd60b9aae34b961
4.1 MB Preview Download