Published December 12, 2018 | Version v1
Dataset Restricted


  • 1. Idiap Research Institute
  • 2. Research and Development Center in Telecommunications


VoicePA is a dataset for speaker recognition and voice presentation attack detection (anti-spoofing).The dataset contains a set of Bona Fide (genuine) voice samples from 44 speakers and 24 different types of speech presentation attacks (spoofing attacks). The attacks were created using the Bona Fide data recorded for the AVSpoof dataset.


Genuine data

The genuine (non-attack) data is taken directly from 'AVspoof' database and can be used by both automatic speaker verification (ASV) and presentation attack detection (PAD) systems (folder 'genuine' contains this data). The genuine data acquisition process lasted approximately two months with 44 subjects, each participating in four different sessions configured in different environmental setups. During each recording session, subjects were asked to speak out prepared (read) speech, pass-phrases and free speech recorded with three devices: one laptop with high-quality microphone and two mobile phones (iPhone 3GS and Samsung S3).


Attack data

Based on the genuine data, 24 types of presentation attacks were generated. Attacks were recorded in 3 different environments (two typical offices and a large conference room), using 5 different playback devices, including built-in laptop speakers, high quality speakers, and three phones: iPhone 3GS, iPhone 6S, and Samsung S3, and assuming an ASV system running on either laptop, iPhone 3GS, or Samsung S3. In addition to a replay type of attacks (speech is recorded and replayed to the microphone of an ASV system), two types of synthetic speech were also replayed: speech synthesis and voice conversion (for the details on these algorithms, please refer to the paper below published in BTAS 2015 and describing 'AVspoof' database).



The data in 'voicePA' database is split into three non-overlapping subsets: training (genuine and attack samples from 4 female and 10 male subjects), development or 'Dev'  (genuine and attack samples from 4 female and 10 male subjects), and evaluation or 'Eval'  (genuine and attack samples from 5 female and 11 male subjects).



Pavel Korshunov, André R. Goncalves, Ricardo P. V. Violato, Flávio O. Simões and Sébastien Marcel. "On the Use of Convolutional Neural Networks for Speech Presentation Attack Detection", International Conference on Identity, Security and Behavior Analysis, 2018.



The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

Access to the dataset is based on an End-User License Agreement. The use of the dataset is strictly restricted to non-commercial research.

Please provide us the following information about the authorized signatory (MUST hold a permanent position):

  • Full name
  • Name of organization
  • Position / job title
  • Academic / professional email address
  • URL where we can verify the information details

Only academic/professional email addresses from the same organization as the signatory are accepted for the online request. All online requests coming from generic email providers such as gmail will be rejected.

You are currently not logged in. Do you have an account? Log in here

Additional details

Related works

Is documented by
Conference paper: 10.1109/ISBA.2018.8311474 (DOI)


TeSLA – An Adaptive Trust-based e-assesment System for Learning 688520
European Commission