This is the pretrained model for our paper which was presented at ICASSP 2022:
"Generalization Ability of MOS Prediction Networks" Erica Cooper, Wen-Chin Huang, Tomoki Toda, Junichi Yamagishi https://arxiv.org/abs/2110.02635

Please cite this paper if you use this pretrained model.

This pretrained model goes with the code found here:
https://github.com/nii-yamagishilab/mos-finetune-ssl

See that codebase's README for more information about dependencies etc.

This pretrained model was fine-tuned from a pretrained model from the Fairseq
project:
https://github.com/facebookresearch/fairseq/tree/main/examples/wav2vec
(Wav2Vec 2.0 Base, no finetuning) 

This pretrained model was fine-tuned using the BVCC dataset:
"How do Voices from Past Speech Synthesis Challenges Compare Today?"
Erica Cooper and Junichi Yamagishi, SSW 2021.
https://www.isca-speech.org/archive/ssw_2021/cooper21_ssw.html


COPYING

Please see `LICENSE-wav2vec2.txt` for the terms and conditions of this pretrained model.


ACKNOWLEDGMENTS

This study is supported by JST CREST grants JP- MJCR18A6, JPMJCR20D3, and JPMJCR19A3, and by MEXT KAKENHI grants 21K11951 and 21K19808. Thanks to the organizers of the Blizzard Challenge and Voice Conversion Challenge, and to Zhenhua Ling, Zhihang Xie, and Zhizheng Wu for answering our questions about past challenges. Thanks also to the Fairseq team for making their code and models available.  Thanks also to the creators of ESPnet-TTS for making their audio samples available.