Published December 4, 2023 | Version v1
Conference paper Open

An Open Dataset of Synthetic Speech

  • 1. ROR icon Fraunhofer Institute for Digital Media Technology
  • 2. ROR icon Centre for Research and Technology Hellas

Description

This paper introduces a multilingual, multispeaker dataset composed of synthetic and natural speech, designed to foster research and benchmarking in synthetic speech detection. The dataset encompasses 18,993 audio utterances synthesized from text, alongside with their corresponding natural equivalents, representing approximately 17 hours of synthetic audio data. The dataset features synthetic speech generated by 156 voices spanning three languages, namely, English, German, and Spanish, with a balanced gender representation. It targets state-of-the-art synthesis methods, and has been released with a license allowing seamless extension and redistribution by the research community.

Notes

The final version of the paper published by IEEE is available online at https://doi.org/10.1109/WIFS58808.2023.10374863.

Files

IEEE_WIFS_2023___Dataset_of_Synthetic_Speech.pdf

Files (128.5 kB)

Name Size Download all
md5:51b6550c813d5c5cbf71c8c2934b21b9
128.5 kB Preview Download

Additional details

Related works

Describes
Dataset: 10.5281/zenodo.8370668 (DOI)

Funding

European Commission
AI4Media – A European Excellence Centre for Media, Society and Democracy 951911
European Commission
vera.ai – vera.ai: VERification Assisted by Artificial Intelligence 101070093

Dates

Accepted
2023-09-15