Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published January 15, 2024 | Version 1.0.0
Dataset Open

WaveFake-Extension: Deep audio fakes using BigVGAN and Avocodo

Description

The primary objective of this dataset is to advance research in audio deepfakes, specifically focusing on synthetic speech generated by state-of-the-art models. As generative neural networks continue to evolve, the need for comprehensive tools to identify and mitigate risks associated with synthetic speech misuse becomes crucial.

Data Overview:

Our dataset extension comprises a total of 39,300 generated audio clips in 16-bit PCM wav format. These samples are generated by two prominent neural network architectures:

  • BigVGAN (Lee et al., 2023a)
  • Avocodo (Bak et al., 2022)

Data Collection Process:

  1. Avocodo:

    • Due to the absence of pre-trained weights, Avocodo was retrained using the publicly available implementation from Bak et al. (2023) (commit 2999557).
    • The training process involved 346 epochs or 563,528 steps with hyperparameters aligned with Bak et al. (2022), including a learning rate of 0.0002 for both the discriminator and generator.
    • Post-training, the Avocodo inference script was used to generate additional LJSpeech samples.
  2. BigVGAN:

    • Two variants of BigVGAN were employed: BigVGAN Large (L) with 112 million parameters and a downsized version with 14 million parameters, referred to as BigVGAN.
    • The code from Lee et al. (2023b) was utilized for both models.
    • Following a similar procedure as Avocodo, the authors' inference script generated synthetic LJSpeech audio samples for both BigVGAN models.

Dataset License:

This dataset is released under the Creative Commons Attribution-ShareAlike 4.0 International License (CC-BY-SA 4.0), allowing for broad use and collaboration within the research community.

Acknowledgments:

The research leading to the development of this dataset was supported by the Bundesministerium für Bildung und Forschung (BMBF) through the WestAI and BnTrAInee projects. The authors express their gratitude to the Gauss Centre for Supercomputing e.V. for funding the project and providing computing resources through the John von Neumann Institute for Computing (NIC) on the GCS Supercomputer JUWELS at Jülich Supercomputing Centre (JSC).

 

Files

avocodo.zip

Files (9.8 GB)

Name Size Download all
md5:303565a9d959ad831eb0ae379da80594
3.4 GB Preview Download
md5:9e0aa71ba45b814588b4af722d06ec46
3.2 GB Preview Download
md5:2d41afd9b59b1d028b9369ecf0b8fd67
3.2 GB Preview Download

Additional details

Related works

Is supplement to
Publication: 10.48550/arXiv.2305.13033 (DOI)

Dates

Created
2024-01

References

  • Lee, S. G., Ping, W., Ginsburg, B., Catanzaro, B., & Yoon, S. (2022). Bigvgan: A universal neural vocoder with large-scale training. arXiv preprint arXiv:2206.04658.
  • Bak, T., Lee, J., Bae, H., Yang, J., Bae, J. S., & Joo, Y. S. (2023, June). Avocodo: Generative adversarial network for artifact-free vocoder. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 37, No. 11, pp. 12562-12570).