I-MSV 2022: Indic-Multilingual and Multi-sensor Speaker Verification Challenge

Jagabandhu Mishra; Mrinmoy Bhattacharjee; S. R. Mahadeva Prasanna

doi:10.5281/zenodo.7681049

Published February 26, 2023 | Version 1

Dataset Open

I-MSV 2022: Indic-Multilingual and Multi-sensor Speaker Verification Challenge

1. Department of Electrical Engineering Indian Institute of Technology Dharwad

Contributors

Research groups:

Sponsor:

Ministry of Electronics and Information Technology (MeitY),¹

1. Govt. of India
2. (IIIT Dharwad, Karnataka),
3. (KLETech, Hubballi, Karnataka),
4. (NIT Nagaland, Nagaland)
5. (CDAC Kolkata, WB),
6. (KLU Vijayawada, AP)
7. (NIT Patna, Bihar)
8. (IIT Dharwad, Karnataka)

Dear Users,

Data is password protected, to get password all you need to do is register using below link. Note that data is free of Cost

Click here for Registration

Speaker Verification (SV) is a task to verify the claimed identity of the claimant using his/her voice sample. Though there exists an ample amount of research in SV technologies, the development concerning a multilingual conversation is limited. In a country like India, almost all the speakers are polyglot in nature. Consequently, the development of a Multilingual SV (MSV) system on the data collected in the Indian scenario is more challenging. With this motivation, the Indic- Multilingual Speaker Verification (I-MSV) Challenge 2022 has been designed for understanding and comparing the state of-the-art SV techniques. For the challenge, approximately 100 hours of data spoken by 100 speakers has been collected using 5 different sensors in 13 Indian languages. The data is divided into development, training, and testing sets and has been made publicly available for further research. The goal of this challenge is to make the SV system robust to language and sensor variations between enrollment and testing. In the challenge, participants were asked to develop the SV system in two scenarios, viz. constrained and unconstrained. The best system in the constrained and unconstrained scenario achieved a performance of 2.12% and 0.26% in terms of Equal Error Rate (EER), respectively.

Files

Development data.zip

Files (19.6 GB)

Name	Size	Download all
Development data.zip md5:6143d53b903740f6d8cf8c6632d65926	11.5 GB	Preview Download
Enrolment data.zip md5:0a7cc503e75ae235de3d3b4930cb221a	856.0 MB	Preview Download
I-MSV-Private-test.zip md5:5f2c09bf8d5a4bd9510c7d0640fbe1f9	4.1 GB	Preview Download
I-MSV-Public-test.zip md5:a6a06d4787653b8f96129b1f7eb95b36	3.1 GB	Preview Download

Additional details

Is documented by: Journal article: 10.48550/arXiv.2302.13209 (DOI)

Z. Bai and X.-L. Zhang, "Speaker recognition based on deep learning: An overview," Neural Networks, vol. 140, pp. 65–99, 2021
A. Khosravani and M. M. Homayounpour, "A plda approach for language and text independent speaker recognition," Computer Speech & Language, vol. 45, pp. 457–474, 2017.
B. C. Haris, G. Pradhan, A. Misra, S. Prasanna, R. K. Das, and R. Sinha, "Multivariability speaker recognition database in indian scenario," International Journal of Speech Technology, vol. 15, no. 4, pp. 441–453, 2012
J.-w. Jung, Y. J. Kim, H.-S. Heo, B.-J. Lee, Y. Kwon, and J. S. Chung, "Raw waveform speaker verification for supervised and self-supervised learning," arXiv preprint arXiv:2203.08488, 2022.
D. P. Vassileios Balntas, Edgar Riba and K. Mikolajczyk, "Learning local feature descriptors with triplets and shallow convolutional neural networks," in Proc. British Machine Vision Conf. (BMVC) (E. R. H. Richard C. Wilson and W. A. P. Smith, eds.), pp. 119.1–119.11, BMVA Press, September 2016.
J. Deng, J. Guo, N. Xue, and S. Zafeiriou, "Arcface: Additive angular margin loss for deep face recognition," in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 4690–4699, 2019.
M. Zhao, Y. Ma, Y. Ding, Y. Zheng, M. Liu, and M. Xu, "MultiQuery Multi-Head Attention Pooling and Inter-Topk Penalty for Speaker Verification," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Process. (ICASSP), pp. 6737–6741, IEEE, 2022.
H. S. Heo, B.-J. Lee, J. Huh, and J. S. Chung, "Clova baseline system for the voxceleb speaker recognition challenge 2020," arXiv preprint arXiv:2009.14153, 2020.
K. Okabe, T. Koshinaka, and K. Shinoda, "Attentive Statistics Pooling for Deep Speaker Embedding," in Proc. Interspeech 2018, pp. 2252– 2256, 2018.
J. S. Chung, J. Huh, S. Mun, M. Lee, H.-S. Heo, S. Choe, C. Ham, S. Jung, B.-J. Lee, and I. Han, "In Defence of Metric Learning for Speaker Recognition," in Proc. Interspeech 2020, pp. 2977–2981, 2020
B. Desplanques, J. Thienpondt, and K. Demuynck, "ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification," in Proc. Interspeech 2020, pp. 3830–3834, 2020
L. Zhang, Y. Li, N. Wang, J. Liu, and L. Xie, "NPU-HC Speaker Verification System for Far-field Speaker Verification Challenge," in Proc. Interspeech 2022, 2022.
Y. Jiang, K. A. Lee, Z. Tang, B. Ma, A. Larcher, and H. Li, "PLDA modeling in i-vector and supervector space for speaker verification," in Proc. Interspeech 2012, pp. 1680–1683, 2012

	All versions	This version
Views	354	351
Downloads	119	119
Data volume	854.6 GB	854.6 GB

I-MSV 2022: Indic-Multilingual and Multi-sensor Speaker Verification Challenge

Contributors

Research groups:

Sponsor:

Files

Development data.zip

Files (19.6 GB)

Additional details

Related works

References

I-MSV 2022: Indic-Multilingual and Multi-sensor Speaker Verification Challenge

Creators

Contributors

Research groups:

Sponsor:

Description

Files

Development data.zip

Files (19.6 GB)

Additional details

Related works

References