Fast and Flexible Neural Audio Synthesis

Lamtharn Hantrakul; Jesse Engel; Adam Roberts; Chenjie Gu; Lamtharn Hantrakul

doi:10.5281/zenodo.3527860

Published November 4, 2019 | Version v1

Conference paper Open

Fast and Flexible Neural Audio Synthesis

Autoregressive neural networks, such as WaveNet, have opened up new avenues for expressive audio synthesis. High-quality speech synthesis utilizes detailed linguistic features for conditioning, but comparable levels of control have yet to be realized for neural synthesis of musical instruments. Here, we demonstrate an autoregressive model capable of synthesizing realistic audio that closely follows fine-scale temporal conditioning for loudness and fundamental frequency. We find the appropriate choice of conditioning features and architectures improves both the quantitative accuracy of audio resynthesis and qualitative responsiveness to creative manipulation of conditioning. While large autoregressive models generate audio much slower than real-time, we achieve these results with a more efficient WaveRNN model, opening the door for exploring real-time interactive audio synthesis with neural networks.

Files

ismir2019_paper_000063.pdf

Files (4.1 MB)

Name	Size	Download all
ismir2019_paper_000063.pdf md5:2687152f1c01c4f7dbd60b9aae34b961	4.1 MB	Preview Download

340

Views

226

Downloads

Show more details

	All versions	This version
Views	340	339
Downloads	226	226
Data volume	1.0 GB	1.0 GB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

ISMIR

Imprint

Proceedings of the 20th International Society for Music Information Retrieval Conference, 524-530. Delft, The Netherlands.

Conference

International Society for Music Information Retrieval Conference (ISMIR 2019) , Delft, The Netherlands, November 4-8, 2019

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 4, 2019
Modified: July 22, 2024

Fast and Flexible Neural Audio Synthesis

Creators

Description

Files

ismir2019_paper_000063.pdf

Files (4.1 MB)