Chris Donahue
Huanru Henry Mao
Yiting Ethan Li
Garrison Cottrell
Julian McAuley
2019-11-04
We are interested in the task of generating multi-instrumental music scores. The Transformer architecture has recently shown great promise for the task of piano score generation—here we adapt it to the multi-instrumental setting. Transformers are complex, high-dimensional language models which are capable of capturing long-term structure in sequence data, but require large amounts of data to fit. Their success on piano score generation is partially explained by the large volumes of symbolic data readily available for that domain. We leverage the recently-introduced NES-MDB dataset of four-voice scores from an early video game sound synthesis chip (the NES), which we find to be well-suited to training with the Transformer architecture. To further improve the performance of our model, we propose a pre-training technique to leverage the information in a large collection of heterogeneous music. Despite differences between the two corpora, we find that this pre-training procedure improves both quantitative and qualitative performance for our primary task.
https://doi.org/10.5281/zenodo.3527902
oai:zenodo.org:3527902
ISMIR
https://zenodo.org/communities/ismir
https://doi.org/10.5281/zenodo.3527901
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
ISMIR 2019, International Society for Music Information Retrieval Conference, Delft, The Netherlands, November 4-8, 2019
LakhNES: Improving Multi-instrumental Music Generation with Cross-domain Pre-training
info:eu-repo/semantics/conferencePaper