Latin-American voice anti-spoofing dataset

Tamayo, Pablo; Manrique, Ruben

doi:10.5281/zenodo.7370805

Published December 1, 2022 | Version 1

Dataset Open

Latin-American voice anti-spoofing dataset

1. Universidad de los Andes

This dataset contains samples of spoof and real human voice with different accents from Latin-American countries.

Table 1. Real samples distribution

Accent

Gender

# Speakers

# Files

Nomenclature

Colombian

Male

Female

17

14

2534

2070

com

cof

Chilean

Male

Female

17

12

2487

1602

clm

clf

Peruvian

Male

Female

20

18

2917

2529

pem

pef

Venezuelan

Male

Female

12

10

1754

1463

vem

vef

Argentinian

Male

Female

12

30

1670

3790

arm

arf

Total

162

22816

The bonafide samples were obtained from the following sources:

Colombian accents: https://www.openslr.org/72/ (License)
Chilean accents: https://www.openslr.org/71/ (License)
Peruvian accents: https://www.openslr.org/73/ (License)
Venezuelan accents: https://www.openslr.org/75/ (License)
Argentinian accents: https://www.openslr.org/61/ (License)

The strategies used to generate the spoof samples:

Table 2. Spoof Samples distribution

Name	Type	#Samples
StarGAN	Voice conversion	16000
CycleGAN	Voice conversion	16000
Diffusion	Voice conversion	16000
TTS	Text-to-speech	5000
TTS-StarGAN	Text-to-speech / Voice conversion	2500
TTS-Diff	Text-to-speech / Voice conversion	2500

StarGAN-VC: Non-parallel many-to-many Voice Conversion Using Star Generative Adversarial Networks
Cyclegan-VC: Non-parallel voice conversion using cycle-consistent adversarial networks
Diffusion-Based Voice Conversion with Fast Maximum Likelihood Sampling Scheme
TTS: Microsoft azure TTS
TTS-VC: Microsoft azure TTS + StarGAN/Diff

Table 3. Dataset overview

Audio Samples

Human Speakers

Spoofing algorithms

Sampling rate

Bonafide	Spoof
22816	58000

Male	Female
78	84

VC	TTS	VC and TTS
3	1	2

16kHz

On the protocol.txt file is listed all the files with the following structure:

Subject_id file_name – spoof_type Label

Consider this line on protocol.txt file:
arf_00295 StarGAN-arf _00295_01349969200-cof _03349 _0077577 - StarGAN spoof
The first part (arf_00295) represent the subject id, from which we can also identify the accent and the gender (see nomenclature column on Table 1). The file name identify the type of spoof following for the source audio file and the target file. StarGAN represents the type of spoof. According to the table 2, this method is a Voice Conversion algorithm. If the file is a bonafide sample, we replace the spoof_type with a dash (-). Finally at the end of the line we refer the kind of label of the file, in the example, the file corresponds to a spoof case.

Each zip file contains 6 folders, each one holds a type of samples. For the voice conversion folders, there are 25 sub-folders that indicate the conversion between accents. For example, Argentina-Venezuela folder indicates that the source accent of the file is Argentinian and the target is Venezuelan accent. Inside the folder there are 64 sub-folders that represent the subjects used for the conversion. For instance, the folder arf_00295-vem_04310 means that the source is an Argentinean female and the target is a Venezuelan male (see Table 1 for nomenclature). In the case of a Text-to-Speech folder there are 5 sub-folders that represent the accents. A TTS-VC folder there are 2 sub-folders that represent the voice conversion strategy used. Inside there are other sub-folders for the different combinations of source and target accents.

You can check the folder tree structure in the tree.txt file. Table 3 shows a summary of the resulting dataset.

Files

Files (7.5 GB)

Name	Size	Download all
Latin_America_Spanish_anti_spoofing_dataset.rar md5:9925c97ecbb63863bd489ed25706db04	7.5 GB	Download

	All versions	This version
Views	2,206	2,183
Downloads	465	455
Data volume	4.4 TB	4.3 TB

Latin-American voice anti-spoofing dataset

Authors/Creators

Description

Files

Files (7.5 GB)