This is the pretrained model for our paper submitted to ICASSP 2023: "CAN KNOWLEDGE OF END-TO-END TEXT-TO-SPEECH MODELS IMPROVE NEURAL MIDI-TO-AUDIO SYNTHESIS SYSTEMS?" Xuan Shi, Erica Cooper, Xin Wang, Junichi Yamagishi, Shrikanth Narayanan https://arxiv.org/abs/2211.13868 Please cite this paper if you use this pretrained model. This pretrained model goes with the code found here: https://github.com/nii-yamagishilab/midi-to-audio See that codebase's README for more information about dependencies etc. The code for training this model was based on the ESPnet-TTS project: "ESPnet-TTS: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit," ICASSP 2020 Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, Takenori Yoshimura, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Yu Zhang, and Xu Tan The data used to train this model was trained using the MAESTRO dataset: "Enabling factorized piano music modeling and generation with the MAESTRO dataset," ICLR 2019 Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi Anna Huang, Sander Dieleman, Erich Elsen, JesseEngel, and Douglas Eck This model consists of a MIDI-to-mel component based on Transformer-TTS: "Neural speech synthesis with transformer network," AAAI 2019 Naihan Li, Shujie Liu, Yanqing Liu, Sheng Zhao, and Ming Liu and a HiFiGAN-based mel-to-audio component: "HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis," NIPS 2020 Jungil Kong, Jaehyeon Kim, and Jaekyoung Bae The two components were first separately trained, and then jointly fine-tuned for an additional 200K steps. COPYING This pretrained model is licensed under the Creative Commons License: Attribution 4.0 International http://creativecommons.org/licenses/by/4.0/legalcode Please see `LICENSE.txt` for the terms and conditions of this pretrained model. ACKNOWLEDGMENTS This study is supported by the Japanese-French joint national project called VoicePersonae, JST CREST (JPMJCR18A6, JPMJCR20D3), MEXT KAKENHI Grants (21K17775, 21H04906, 21K11951), Japan, and Google AI for Japan program.