Preview

This is the pretrained model for our paper submitted to ICASSP 2023:
"CAN KNOWLEDGE OF END-TO-END TEXT-TO-SPEECH MODELS IMPROVE NEURAL
MIDI-TO-AUDIO SYNTHESIS SYSTEMS?"
Xuan Shi, Erica Cooper, Xin Wang, Junichi Yamagishi, Shrikanth Narayanan
https://arxiv.org/abs/2211.13868
Please cite this paper if you use this pretrained model.

This pretrained model goes with the code found here:
https://github.com/nii-yamagishilab/midi-to-audio

See that codebase's README for more information about dependencies etc.

The code for training this model was based on the ESPnet-TTS project:
"ESPnet-TTS: Unified, reproducible, and integratable open source end-to-end
text-to-speech toolkit," ICASSP 2020
Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, Takenori Yoshimura,
Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Yu Zhang, and Xu Tan

The data used to train this model was trained using the MAESTRO dataset:
"Enabling factorized piano music modeling and generation with the MAESTRO
dataset," ICLR 2019
Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi Anna Huang,
Sander Dieleman, Erich Elsen, JesseEngel, and Douglas Eck

This model consists of a MIDI-to-mel component based on Transformer-TTS:
"Neural speech synthesis with transformer network," AAAI 2019
Naihan Li, Shujie Liu, Yanqing Liu, Sheng Zhao, and Ming Liu

and a HiFiGAN-based mel-to-audio component:
"HiFi-GAN: Generative Adversarial Networks for Efficient and High
Fidelity Speech Synthesis," NIPS 2020
Jungil Kong, Jaehyeon Kim, and Jaekyoung Bae

The two components were first separately trained, and then jointly fine-tuned
for an additional 200K steps.


COPYING

This pretrained model is licensed under the Creative Commons License:
Attribution 4.0 International
http://creativecommons.org/licenses/by/4.0/legalcode 
Please see `LICENSE.txt` for the terms and conditions of this pretrained model.


ACKNOWLEDGMENTS

This study is supported by the Japanese-French joint national project called
VoicePersonae, JST CREST (JPMJCR18A6, JPMJCR20D3), MEXT KAKENHI Grants
(21K17775, 21H04906, 21K11951), Japan, and Google AI for Japan program.