AVSpoof
- 1. Idiap Research Institute
Description
AVSpoof is a dataset for speaker recognition and voice presentation attack detection (anti-spoofing). The dataset is intended to provide stable, non-biased presentation attacks (spoofing attacks) in order for researchers to test both speaker recognition (ASV systems) and presentation attack detection (anti-spoofing) techniques. The data acquisition process lasted two months with 44 persons, each participating in several sessions configured in different environmental conditions and setups.
The data acquisition process is divided into four different sessions, each scheduled several days apart in different setups and environmental conditions (e.g. different in terms of background noise, reverberation, etc.) for each of 31 male and 13 female participants. The first session which is supposed to be used as training set while creating the attacks, was performed in the most controlled conditions. Besides, the conditions for the last three sessions dedicated to test trials were more relaxed in order to grasp the challenging scenarios. The audio data were recorded by three different devices including (a) one good-quality microphone, AT2020USB+, and two mobiles, (b) Samsung Galaxy S4 (phone1) and (c) Iphone 3GS (phone2) .
The positioning of the devices was stabilized for each session and each participant in order to standardize the recording settings.
For each session, the participant was subjected to three different data acquisition protocols as in the following:
- Reading part (read): 10/40 pre-defined sentences are read by the participant.
- Pass-phrases part (pass): 5 short prompts are read by the participant.
- Free speech part (free): The participant speaks freely about any topic for 3 to 10 minutes.
The number, the length, as well as the content of the sentences for the reading and pass-phrases part are carefully selected in order to satisfy the constraints in terms of readability, data acquisition and attack quality. Similarly, the minimum duration of the free speech part is also determined according to our preliminary investigations mostly on the voice conversion attacks for which the free speech data would be included in the training set.
References
S. K. Ergünay, E. Khoury, A. Lazaridis, and S. Marcel. On the vulnerability of speaker verification to realistic voice spoofing. In Proc. Int. Conf. on Biometrics: Theory, Applications and Systems (BTAS), 2015.
10.1109/BTAS.2015.7358783
http://publications.idiap.ch/index.php/publications/show/3185
Files
Additional details
Related works
- Is documented by
- Conference paper: 10.1109/BTAS.2015.7358783 (DOI)