Weighted generative adversarial network for many-to-many voice conversion

Paul, Dipjyoti; Pantazis, Yannis; Stylianou, Yannis

doi:10.18154/RWTH-CONV-239420

Published September 13, 2019 | Version v1

Conference paper Open

Weighted generative adversarial network for many-to-many voice conversion

1. University of Crete, Greece
2. Institute of Applied and Computational Mathematics FORTH, Greece

The goal of voice conversion (VC) is to convert speech from a source speaker to that of a target, without chang-ing phonetic contents. VC usually relies on parallel data for training, which limits its practical applications.Existing approaches are also limited in handling multiple speakers, since different models should be built inde-pendently for every speaker pair. To tackle that, a variant of Generative Adversarial Network (StarGAN-VC)were introduced that allows many-to-many mapping instead of learning all the pairwise transformations. More-over, StarGAN-VC can handle non-parallel data, i.e., speakers do not need to utter the same sentences. In thispaper, we suggest an algorithmic variation of StarGAN training where suitable weights are introduced. Weightswhich modify the Generator’s gradient value aim to put more power to fake samples that fool the Discriminator.The suggested algorithm results in a stronger Generator. We refer to this variation as weighted-StarGAN (weS-tarGAN). In weStarGAN, the convergence of the training performance is accelerated. More importantly, theproposed algorithm achieves significant improvement against baseline StarGAN-VC concerning speech subjec-tive quality for both speech quality and speaker similarity.

Files

ESR10_ICA2019_000761.pdf

Files (277.8 kB)

Name	Size	Download all
ESR10_ICA2019_000761.pdf md5:e7a735f717d9721c9565e65618c98402	277.8 kB	Preview Download

Additional details

European Commission
ENRICH - Enriched communication across the lifespan 675324

Views

Downloads

Show more details

	All versions	This version
Views	60	60
Downloads	47	47
Data volume	13.3 MB	13.3 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

Deutsche Gesellschaft für Akustik (DEGA e. V.) & RWTH Publications

Imprint

Proceedings of the 23rd International Congress on Acoustics, integrating 4th EAA Euroregio 2019, 5721-5725. Aachen, Germany. ISBN: 978-3-939296-15-7.

Conference

23rd International Congress on Acoustics (ICA 2019) , Aachen (Germany), 9-13 September 2019 (Session 18 O - Speech enrichment: listening effort and intelligibility)

Languages

English

License: Creative Commons Attribution Non Commercial No Derivatives 4.0 International

No further description. Read more

Technical metadata

Created: November 6, 2019
Modified: July 22, 2024

Weighted generative adversarial network for many-to-many voice conversion

Authors/Creators

Description

Files

ESR10_ICA2019_000761.pdf

Files (277.8 kB)

Additional details

Funding