Published September 13, 2019 | Version v1
Conference paper Open

Weighted generative adversarial network for many-to-many voice conversion

  • 1. University of Crete, Greece
  • 2. Institute of Applied and Computational Mathematics FORTH, Greece

Description

The goal of voice conversion (VC) is to convert speech from a source speaker to that of a target, without chang-ing phonetic contents. VC usually relies on parallel data for training, which limits its practical applications.Existing approaches are also limited in handling multiple speakers, since different models should be built inde-pendently for every speaker pair. To tackle that, a variant of Generative Adversarial Network (StarGAN-VC)were introduced that allows many-to-many mapping instead of learning all the pairwise transformations. More-over, StarGAN-VC can handle non-parallel data, i.e., speakers do not need to utter the same sentences. In thispaper, we suggest an algorithmic variation of StarGAN training where suitable weights are introduced. Weightswhich modify the Generator’s gradient value aim to put more power to fake samples that fool the Discriminator.The suggested algorithm results in a stronger Generator. We refer to this variation as weighted-StarGAN (weS-tarGAN). In weStarGAN, the convergence of the training performance is accelerated. More importantly, theproposed algorithm achieves significant improvement against baseline StarGAN-VC concerning speech subjec-tive quality for both speech quality and speaker similarity.

Files

ESR10_ICA2019_000761.pdf

Files (277.8 kB)

Name Size Download all
md5:e7a735f717d9721c9565e65618c98402
277.8 kB Preview Download

Additional details

Funding

European Commission
ENRICH - Enriched communication across the lifespan 675324