Published July 22, 2015 | Version v1
Dataset Restricted

AVSpoof

Description

AVSpoof is a dataset for speaker recognition and voice presentation attack detection (anti-spoofing). The dataset is intended to provide stable, non-biased presentation attacks (spoofing attacks) in order for researchers to test both speaker recognition (ASV systems) and presentation attack detection (anti-spoofing) techniques. The data acquisition process lasted two months with 44 persons, each participating in several sessions configured in different environmental conditions and setups.

The data acquisition process is divided into four different sessions, each scheduled several days apart in different setups and environmental conditions (e.g. different in terms of background noise, reverberation, etc.) for each of 31 male and 13 female participants. The first session which is supposed to be used as training set while creating the attacks, was performed in the most controlled conditions. Besides, the conditions for the last three sessions dedicated to test trials were more relaxed in order to grasp the challenging scenarios. The audio data were recorded by three different devices including (a) one good-quality microphone, AT2020USB+, and two mobiles, (b) Samsung Galaxy S4 (phone1) and (c) Iphone 3GS (phone2) .

The positioning of the devices was stabilized for each session and each participant in order to standardize the recording settings.

For each session, the participant was subjected to three different data acquisition protocols as in the following:

  • Reading part (read): 10/40 pre-defined sentences are read by the participant.
  • Pass-phrases part (pass): 5 short prompts are read by the participant.
  • Free speech part (free): The participant speaks freely about any topic for 3 to 10 minutes.

The number, the length, as well as the content of the sentences for the reading and pass-phrases part are carefully selected in order to satisfy the constraints in terms of readability, data acquisition and attack quality. Similarly, the minimum duration of the free speech part is also determined according to our preliminary investigations mostly on the voice conversion attacks for which the free speech data would be included in the training set.

 

References

S. K. Ergünay, E. Khoury, A. Lazaridis, and S. Marcel. On the vulnerability of speaker verification to realistic voice spoofing. In Proc. Int. Conf. on Biometrics: Theory, Applications and Systems (BTAS), 2015.
10.1109/BTAS.2015.7358783 
http://publications.idiap.ch/index.php/publications/show/3185

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

Access to the dataset is based on an End-User License Agreement. The use of the dataset is strictly restricted to non-commercial research.

Please provide us the following information about the authorized signatory (MUST hold a permanent position):

  • Full name
  • Name of organization
  • Position / job title
  • Academic / professional email address
  • URL where we can verify the information details

Only academic/professional email addresses from the same organization as the signatory are accepted for the online request. All online requests coming from generic email providers such as gmail will be rejected.

You are currently not logged in. Do you have an account? Log in here

Additional details

Related works

Is documented by
Conference paper: 10.1109/BTAS.2015.7358783 (DOI)

Funding

BEAT – Biometrics Evaluation and Testing 284989
European Commission