Planned intervention: On Thursday 19/09 between 05:30-06:30 (UTC), Zenodo will be unavailable because of a scheduled upgrade in our storage cluster.
Published September 19, 2019 | Version v1
Conference paper Open

Non-Parallel Voice Conversion Using Weighted Generative Adversarial Networks

  • 1. Department of Computer Science, University of Crete, Greece
  • 2. Institute of Applied and Computational Mathematics, FORTH, Greece

Description

In this paper, we suggest a novel way to train GenerativeAdversarial Network (GAN) for the purpose of non-parallel,many-to-many voice conversion. The goal of voice conversion(VC) is to transform speech from a source speaker to that of atarget speaker without changing the phonetic contents. Basedon ideas from Game Theory, we suggest to multiply the gradi-ent of the Generator with suitable weights. Weights are calcu-lated so that they increase the power of fake samples that foolthe Discriminator resulting in a stronger Generator. Motivatedby a recently presented GAN based approach for VC, StarGAN-VC, we suggest a variation to StarGAN, referred to as WeightedStarGAN (WeStarGAN). The experiments are conducted onstandard CMU ARCTIC database. WeStarGAN-VC approachachieves significantly better relative performance and is clearlypreferred over recently proposed StarGAN-VC method in termsof speech subjective quality and speaker similarity with 75% and 65%preference scores, respectively.

Files

ESR10_Interspeech2019_2869.pdf

Files (351.9 kB)

Name Size Download all
md5:b3e8e116eda265478ab4258aceca49e3
351.9 kB Preview Download

Additional details

Funding

ENRICH – Enriched communication across the lifespan 675324
European Commission