Published December 1, 2022 | Version 1
Dataset Open

Latin-American voice anti-spoofing dataset

  • 1. Universidad de los Andes

Description

This dataset contains samples of spoof and real human voice with different accents from Latin-American countries.

                                  Table 1. Real samples distribution

Accent Gender # Speakers # Files Nomenclature
Colombian
Male
Female
17
14
2534
2070
com
cof
Chilean
Male
Female
17
12
2487
1602
clm
clf
Peruvian
Male
Female
20
18
2917
2529
pem
pef
Venezuelan
Male
Female
12
10
1754
1463
vem
vef
Argentinian
Male
Female
12
30
1670
3790
arm
arf
Total   162 22816  

 

The bonafide samples were obtained from the following sources:

 

The strategies used to generate the spoof samples:

                                        Table 2. Spoof Samples distribution

Name Type #Samples
StarGAN Voice conversion 16000
CycleGAN Voice conversion 16000
Diffusion Voice conversion 16000
TTS Text-to-speech 5000
TTS-StarGAN Text-to-speech / Voice conversion 2500
TTS-Diff Text-to-speech / Voice conversion 2500

 

 

                                          Table 3. Dataset overview

Audio Samples Human Speakers Spoofing algorithms Sampling rate
Bonafide Spoof
22816 58000
Male Female
78 84
VC TTS VC and TTS
3 1 2
16kHz

 

On the protocol.txt file is listed all the files with the following structure:

                                                               Subject_id  file_name –  spoof_type  Label

Consider this line on protocol.txt file:
arf_00295 StarGAN-arf _00295_01349969200-cof _03349 _0077577 - StarGAN spoof
The first part (arf_00295) represent the subject id, from which we can also identify the accent and the gender (see nomenclature column on Table 1). The file name identify the type of spoof following for the source audio file and the target file. StarGAN represents the type of spoof. According to the table 2, this method is a Voice Conversion algorithm. If the file is a bonafide sample, we replace the spoof_type with a dash (-). Finally at the end of the line we refer the kind of label of the file, in the example, the file corresponds to a spoof case.

Each zip file contains 6 folders, each one holds a type of samples. For the voice conversion folders, there are 25 sub-folders that indicate the conversion between accents. For example, Argentina-Venezuela folder indicates that the source accent of the file is Argentinian and the target is Venezuelan accent. Inside the folder there are 64 sub-folders that represent the subjects used for the conversion. For instance, the folder arf_00295-vem_04310 means that the source is an Argentinean female and the target is a Venezuelan male (see Table 1 for nomenclature). In the case of a Text-to-Speech folder there are 5 sub-folders that represent the accents. A TTS-VC folder there are 2 sub-folders that represent the voice conversion strategy used. Inside there are other sub-folders for the different combinations of source and target accents.

You can check the folder tree structure in the tree.txt file. Table 3 shows a summary of the resulting dataset.

 

Files

Files (7.5 GB)

Name Size Download all
md5:9925c97ecbb63863bd489ed25706db04
7.5 GB Download