Dataset Restricted Access

VoicePA

Kucur Ergünay, Serife; Khoury, Elie; Lazaridis, Alexandros; Marcel, Sébastien; Korshunov, Pavel; Goncalves, André R.; Violato, Ricardo P. V.

VoicePA is a dataset for speaker recognition and voice presentation attack detection (anti-spoofing).The dataset contains a set of Bona Fide (genuine) voice samples from 44 speakers and 24 different types of speech presentation attacks (spoofing attacks). The attacks were created using the Bona Fide data recorded for the AVSpoof dataset.

 

Genuine data

The genuine (non-attack) data is taken directly from 'AVspoof' database and can be used by both automatic speaker verification (ASV) and presentation attack detection (PAD) systems (folder 'genuine' contains this data). The genuine data acquisition process lasted approximately two months with 44 subjects, each participating in four different sessions configured in different environmental setups. During each recording session, subjects were asked to speak out prepared (read) speech, pass-phrases and free speech recorded with three devices: one laptop with high-quality microphone and two mobile phones (iPhone 3GS and Samsung S3).

 

Attack data

Based on the genuine data, 24 types of presentation attacks were generated. Attacks were recorded in 3 different environments (two typical offices and a large conference room), using 5 different playback devices, including built-in laptop speakers, high quality speakers, and three phones: iPhone 3GS, iPhone 6S, and Samsung S3, and assuming an ASV system running on either laptop, iPhone 3GS, or Samsung S3. In addition to a replay type of attacks (speech is recorded and replayed to the microphone of an ASV system), two types of synthetic speech were also replayed: speech synthesis and voice conversion (for the details on these algorithms, please refer to the paper below published in BTAS 2015 and describing 'AVspoof' database).

 

Protocols

The data in 'voicePA' database is split into three non-overlapping subsets: training (genuine and attack samples from 4 female and 10 male subjects), development or 'Dev'  (genuine and attack samples from 4 female and 10 male subjects), and evaluation or 'Eval'  (genuine and attack samples from 5 female and 11 male subjects).

 

Reference

Pavel Korshunov, André R. Goncalves, Ricardo P. V. Violato, Flávio O. Simões and Sébastien Marcel. "On the Use of Convolutional Neural Networks for Speech Presentation Attack Detection", International Conference on Identity, Security and Behavior Analysis, 2018.
10.1109/ISBA.2018.8311474
http://publications.idiap.ch/index.php/publications/show/3779

Restricted Access

You may request access to the files in this upload, provided that you fulfil the conditions below. The decision whether to grant/deny access is solely under the responsibility of the record owner.


Access to the dataset is based on an End-User License Agreement. The use of the dataset is strictly restricted to non-commercial research.

Please provide us the following information about the authorized signatory (MUST hold a permanent position):

  • Full name
  • Name of organization
  • Position / job title
  • Academic / professional email address
  • URL where we can verify the information details

The requester must use their personal valid email address from the same organization as the signatory to request access.


28
0
views
downloads
Views 28
Downloads 0
Data volume 0 Bytes
Unique views 24
Unique downloads 0

Share

Cite as