Effects of Convolutional Autoencoder Bottleneck Width on StarGAN-based Singing Technique Conversion

Su, Tung-Cheng; Chang, Yung-Chuan; Liu, Yi-Wen

doi:10.5281/zenodo.10113449

Published November 13, 2023 | Version v1

Conference paper Open

Effects of Convolutional Autoencoder Bottleneck Width on StarGAN-based Singing Technique Conversion

Singing technique conversion (STC) refers to the task of converting from one voice technique to another while leaving the original singer identity, melody, and linguistic components intact. Previous STC studies, as well as singing voice conversion research in general, have utilized convolutional autoencoders (CAEs) for conversion, but how the bottleneck width of the CAE affects the synthesis quality has not been thoroughly evaluated. To this end, we constructed a GAN-based multi-domain STC system which took advantage of the WORLD vocoder representation and the CAE architecture. We varied the bottleneck width of the CAE, and evaluated the conversion results subjectively. The model was trained on a Mandarin dataset which features four singers and four singing techniques: the chest voice, the falsetto, the raspy voice, and the whistle voice. The results show that a wider bottleneck corresponds to better articulation clarity but does not necessarily lead to higher likeness to the target technique. Among the four techniques, we also found that the whistle voice is the easiest target for conversion, while the other three techniques as a source produce more convincing conversion results than the whistle.

Files

cmmr2023_3e-2.pdf

Files (2.1 MB)

Name	Size	Download all
cmmr2023_3e-2.pdf md5:f3cc1565d293cd441edc8ec1ba525139	2.1 MB	Preview Download

116

Views

Downloads

Show more details

	All versions	This version
Views	116	116
Downloads	89	89
Data volume	252.6 MB	252.6 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

Zenodo

Imprint

Proceedings of the 16th International Symposium on Computer Music Multidisciplinary Research, 442–449.

Conference

The 16th International Symposium on Computer Music Multidisciplinary Research (CMMR 2023) , Tokyo, Japan, 13th-17th November 2023 (Session 3e, Part 2)

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 11, 2023
Modified: July 10, 2024

Effects of Convolutional Autoencoder Bottleneck Width on StarGAN-based Singing Technique Conversion

Creators

Description

Files

cmmr2023_3e-2.pdf

Files (2.1 MB)