Published November 4, 2019
| Version v1
Conference paper
Open
Fast and Flexible Neural Audio Synthesis
Description
Autoregressive neural networks, such as WaveNet, have opened up new avenues for expressive audio synthesis. High-quality speech synthesis utilizes detailed linguistic features for conditioning, but comparable levels of control have yet to be realized for neural synthesis of musical instruments. Here, we demonstrate an autoregressive model capable of synthesizing realistic audio that closely follows fine-scale temporal conditioning for loudness and fundamental frequency. We find the appropriate choice of conditioning features and architectures improves both the quantitative accuracy of audio resynthesis and qualitative responsiveness to creative manipulation of conditioning. While large autoregressive models generate audio much slower than real-time, we achieve these results with a more efficient WaveRNN model, opening the door for exploring real-time interactive audio synthesis with neural networks.
Files
ismir2019_paper_000063.pdf
Files
(4.1 MB)
Name | Size | Download all |
---|---|---|
md5:2687152f1c01c4f7dbd60b9aae34b961
|
4.1 MB | Preview Download |