4580204
doi
10.34777/gmc4-v249
oai:zenodo.org:4580204
user-idiap
user-eu
Khoury, Elie
Idiap Research Institute
Lazaridis, Alexandros
Idiap Research Institute
Marcel, Sébastien
Idiap Research Institute
Korshunov, Pavel
Idiap Research Institute
Goncalves, André R.
Research and Development Center in Telecommunications
Violato, Ricardo P. V.
Research and Development Center in Telecommunications
VoicePA
Kucur Ergünay, Serife
Idiap Research Institute
doi:10.1109/ISBA.2018.8311474
info:eu-repo/semantics/restrictedAccess
biometrics
speaker recognition
synthetic voice
spoofing
presentation attacks
<p>VoicePA is a dataset for speaker recognition and voice presentation attack detection (anti-spoofing).The dataset contains a set of Bona Fide (genuine) voice samples from 44 speakers and 24 different types of speech presentation attacks (spoofing attacks). The attacks were created using the Bona Fide data recorded for the AVSpoof dataset.</p>
<p> </p>
<p><strong>Genuine data</strong></p>
<p>The genuine (non-attack) data is taken directly from 'AVspoof' database and can be used by both automatic speaker verification (ASV) and presentation attack detection (PAD) systems (folder 'genuine' contains this data). The genuine data acquisition process lasted approximately two months with 44 subjects, each participating in four different sessions configured in different environmental setups. During each recording session, subjects were asked to speak out prepared (read) speech, pass-phrases and free speech recorded with three devices: one laptop with high-quality microphone and two mobile phones (iPhone 3GS and Samsung S3).</p>
<p> </p>
<p><strong>Attack data</strong></p>
<p>Based on the genuine data, 24 types of presentation attacks were generated. Attacks were recorded in 3 different environments (two typical offices and a large conference room), using 5 different playback devices, including built-in laptop speakers, high quality speakers, and three phones: iPhone 3GS, iPhone 6S, and Samsung S3, and assuming an ASV system running on either laptop, iPhone 3GS, or Samsung S3. In addition to a replay type of attacks (speech is recorded and replayed to the microphone of an ASV system), two types of synthetic speech were also replayed: speech synthesis and voice conversion (for the details on these algorithms, please refer to the paper below published in BTAS 2015 and describing 'AVspoof' database).</p>
<p> </p>
<p><strong>Protocols</strong></p>
<p>The data in 'voicePA' database is split into three non-overlapping subsets: training (genuine and attack samples from 4 female and 10 male subjects), development or 'Dev' (genuine and attack samples from 4 female and 10 male subjects), and evaluation or 'Eval' (genuine and attack samples from 5 female and 11 male subjects).</p>
<p> </p>
<p><strong>Reference</strong></p>
<p>Pavel Korshunov, André R. Goncalves, Ricardo P. V. Violato, Flávio O. Simões and Sébastien Marcel. "On the Use of Convolutional Neural Networks for Speech Presentation Attack Detection", International Conference on Identity, Security and Behavior Analysis, 2018.<br>
<a href="http://publications.idiap.ch/index.php/publications/show/3779">10.1109/ISBA.2018.8311474<br>
http://publications.idiap.ch/index.php/publications/show/3779</a></p>
Zenodo
2018-12-12
info:eu-repo/semantics/other
4580203
user-idiap
user-eu
award_title=An Adaptive Trust-based e-assesment System for Learning; award_number=688520; award_identifiers_scheme=url; award_identifiers_identifier=https://cordis.europa.eu/projects/688520; funder_id=00k4n6c32; funder_name=European Commission;
1678106572.576081
public
10.1109/ISBA.2018.8311474
Is documented by
doi