LlamaPartialSpoof
Description
Description
LlamaPartialSpoof Dataset v1.0.b
This dataset is made available under Creative Commons Attribution 4.0 International License (CC BY 4.0).
For any inquiries, please email me at contact @ hieuthi.com
v1.0.b includes two parts:
- R01TTS.0.a contains bonafide, fully fake (TTS001--006), partially fake (TTS001--006) using crossfade
- R01TTS.0.b contains partially fake (TTS001--006) using cut/paste or overlap/add
Labels of each part is in text files with corresponding names while speech samples are in the archives.
Each line in the label files is formatted as follows:<id> <utterance-duration> <utterance-label> <segment1> <segment2> ... <segmentN>
Each segment is formatted as follows:<start>-<end>-<label>
Label is either "bonafide" or "spoof".
The id is also the name of the wave files. It explicitly states the name of the TTS model used for generation.
The dataset also includes a metadata file (metadata_crossfade.csv) contains information about the fading function used to create partially fake samples (crossfade). With t=triangle (linear), q=quarter of sinewave, h=half of sinewave, l=logarithmic, p=inverted parabola.
Acknowlegments
Dr. Hieu-Thi Luong is funded by RIE2025 NRF International Partnership Funding Initiative. This research is supported by the National Research Foundation, Singapore, under the AI Singapore Programme (AISG Award No.: AISG2-TC-2023-011-SGIL). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore.
Files
label_R01TTS.0.a.txt
Files
(28.2 GB)
Name | Size | Download all |
---|---|---|
md5:86c60280e4cb2957542b365c3c8f52ac
|
10.5 MB | Preview Download |
md5:6f5f94d3dbca70c011370ded232c8519
|
14.2 MB | Preview Download |
md5:2ab724713fdaf49e4523c4503bfd068d
|
18.7 kB | Preview Download |
md5:18c35c8fe5aa092ff61208c12e095f4c
|
1.7 MB | Preview Download |
md5:685acfe986b50baaf3e25e9d5e3091a4
|
15.4 GB | Download |
md5:a4de860a845816fa65785dddd7849700
|
12.8 GB | Download |
md5:aac4156c57c6dd05abc7edc65c71af72
|
1.2 kB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/hieuthi/LlamaPartialSpoof
References
- Luong, H. T., Li, H., Zhang, L., Lee, K. A., & Chng, E. S. (2024). LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation. arXiv preprint arXiv:2409.14743.