WaveFake-Extension: Deep audio fakes using BigVGAN and Avocodo

Wolter, Moritz; Gasenzer, Konstantin

doi:10.48550/arXiv.2305.13033

Published January 15, 2024 | Version 1.0.0

Dataset Open

WaveFake-Extension: Deep audio fakes using BigVGAN and Avocodo

Contributors

Project members:

The primary objective of this dataset is to advance research in audio deepfakes, specifically focusing on synthetic speech generated by state-of-the-art models. As generative neural networks continue to evolve, the need for comprehensive tools to identify and mitigate risks associated with synthetic speech misuse becomes crucial.

Data Overview:

Our dataset extension comprises a total of 39,300 generated audio clips in 16-bit PCM wav format. These samples are generated by two prominent neural network architectures:

BigVGAN (Lee et al., 2023a)
Avocodo (Bak et al., 2022)

Data Collection Process:

Avocodo:
- Due to the absence of pre-trained weights, Avocodo was retrained using the publicly available implementation from Bak et al. (2023) (commit 2999557).
- The training process involved 346 epochs or 563,528 steps with hyperparameters aligned with Bak et al. (2022), including a learning rate of 0.0002 for both the discriminator and generator.
- Post-training, the Avocodo inference script was used to generate additional LJSpeech samples.
BigVGAN:
- Two variants of BigVGAN were employed: BigVGAN Large (L) with 112 million parameters and a downsized version with 14 million parameters, referred to as BigVGAN.
- The code from Lee et al. (2023b) was utilized for both models.
- Following a similar procedure as Avocodo, the authors' inference script generated synthetic LJSpeech audio samples for both BigVGAN models.

Dataset License:

This dataset is released under the Creative Commons Attribution-ShareAlike 4.0 International License (CC-BY-SA 4.0), allowing for broad use and collaboration within the research community.

Acknowledgments:

The research leading to the development of this dataset was supported by the Bundesministerium für Bildung und Forschung (BMBF) through the WestAI and BnTrAInee projects. The authors express their gratitude to the Gauss Centre for Supercomputing e.V. for funding the project and providing computing resources through the John von Neumann Institute for Computing (NIC) on the GCS Supercomputer JUWELS at Jülich Supercomputing Centre (JSC).

Files

avocodo.zip

Files (9.8 GB)

Name	Size	Download all
avocodo.zip md5:303565a9d959ad831eb0ae379da80594	3.4 GB	Preview Download
bigvgan.zip md5:9e0aa71ba45b814588b4af722d06ec46	3.2 GB	Preview Download
lbigvgan.zip md5:2d41afd9b59b1d028b9369ecf0b8fd67	3.2 GB	Preview Download

Additional details

Is supplement to: Publication: 10.48550/arXiv.2305.13033 (DOI)

Created: 2024-01

Lee, S. G., Ping, W., Ginsburg, B., Catanzaro, B., & Yoon, S. (2022). Bigvgan: A universal neural vocoder with large-scale training. arXiv preprint arXiv:2206.04658.
Bak, T., Lee, J., Bae, H., Yang, J., Bae, J. S., & Joo, Y. S. (2023, June). Avocodo: Generative adversarial network for artifact-free vocoder. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 37, No. 11, pp. 12562-12570).

	All versions	This version
Views	121	121
Downloads	79	79
Data volume	268.2 GB	268.2 GB

WaveFake-Extension: Deep audio fakes using BigVGAN and Avocodo

Contributors

Project members:

Data Overview:

Data Collection Process:

Dataset License:

Acknowledgments:

Files

avocodo.zip

Files (9.8 GB)

Additional details

Related works

Dates

References

WaveFake-Extension: Deep audio fakes using BigVGAN and Avocodo

Creators

Contributors

Project members:

Description

Data Overview:

Data Collection Process:

Dataset License:

Acknowledgments:

Files

avocodo.zip

Files (9.8 GB)

Additional details

Related works

Dates

References