I-MSV 2022: Indic-Multilingual and Multi-sensor Speaker Verification Challenge
- 1. Department of Electrical Engineering Indian Institute of Technology Dharwad
Contributors
Research groups:
- 1. Govt. of India
- 2. (IIIT Dharwad, Karnataka),
- 3. (KLETech, Hubballi, Karnataka),
- 4. (NIT Nagaland, Nagaland)
- 5. (CDAC Kolkata, WB),
- 6. (KLU Vijayawada, AP)
- 7. (NIT Patna, Bihar)
- 8. (IIT Dharwad, Karnataka)
Description
Dear Users,
Data is password protected, to get password all you need to do is register using below link. Note that data is free of Cost
Speaker Verification (SV) is a task to verify the claimed identity of the claimant using his/her voice sample. Though there exists an ample amount of research in SV technologies, the development concerning a multilingual conversation is limited. In a country like India, almost all the speakers are polyglot in nature. Consequently, the development of a Multilingual SV (MSV) system on the data collected in the Indian scenario is more challenging. With this motivation, the Indic- Multilingual Speaker Verification (I-MSV) Challenge 2022 has been designed for understanding and comparing the state of-the-art SV techniques. For the challenge, approximately 100 hours of data spoken by 100 speakers has been collected using 5 different sensors in 13 Indian languages. The data is divided into development, training, and testing sets and has been made publicly available for further research. The goal of this challenge is to make the SV system robust to language and sensor variations between enrollment and testing. In the challenge, participants were asked to develop the SV system in two scenarios, viz. constrained and unconstrained. The best system in the constrained and unconstrained scenario achieved a performance of 2.12% and 0.26% in terms of Equal Error Rate (EER), respectively.
Files
Development data.zip
Additional details
Related works
- Is documented by
- Journal article: 10.48550/arXiv.2302.13209 (DOI)
References
- Z. Bai and X.-L. Zhang, "Speaker recognition based on deep learning: An overview," Neural Networks, vol. 140, pp. 65–99, 2021
- A. Khosravani and M. M. Homayounpour, "A plda approach for language and text independent speaker recognition," Computer Speech & Language, vol. 45, pp. 457–474, 2017.
- B. C. Haris, G. Pradhan, A. Misra, S. Prasanna, R. K. Das, and R. Sinha, "Multivariability speaker recognition database in indian scenario," International Journal of Speech Technology, vol. 15, no. 4, pp. 441–453, 2012
- J.-w. Jung, Y. J. Kim, H.-S. Heo, B.-J. Lee, Y. Kwon, and J. S. Chung, "Raw waveform speaker verification for supervised and self-supervised learning," arXiv preprint arXiv:2203.08488, 2022.
- D. P. Vassileios Balntas, Edgar Riba and K. Mikolajczyk, "Learning local feature descriptors with triplets and shallow convolutional neural networks," in Proc. British Machine Vision Conf. (BMVC) (E. R. H. Richard C. Wilson and W. A. P. Smith, eds.), pp. 119.1–119.11, BMVA Press, September 2016.
- J. Deng, J. Guo, N. Xue, and S. Zafeiriou, "Arcface: Additive angular margin loss for deep face recognition," in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 4690–4699, 2019.
- M. Zhao, Y. Ma, Y. Ding, Y. Zheng, M. Liu, and M. Xu, "MultiQuery Multi-Head Attention Pooling and Inter-Topk Penalty for Speaker Verification," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Process. (ICASSP), pp. 6737–6741, IEEE, 2022.
- H. S. Heo, B.-J. Lee, J. Huh, and J. S. Chung, "Clova baseline system for the voxceleb speaker recognition challenge 2020," arXiv preprint arXiv:2009.14153, 2020.
- K. Okabe, T. Koshinaka, and K. Shinoda, "Attentive Statistics Pooling for Deep Speaker Embedding," in Proc. Interspeech 2018, pp. 2252– 2256, 2018.
- J. S. Chung, J. Huh, S. Mun, M. Lee, H.-S. Heo, S. Choe, C. Ham, S. Jung, B.-J. Lee, and I. Han, "In Defence of Metric Learning for Speaker Recognition," in Proc. Interspeech 2020, pp. 2977–2981, 2020
- B. Desplanques, J. Thienpondt, and K. Demuynck, "ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification," in Proc. Interspeech 2020, pp. 3830–3834, 2020
- L. Zhang, Y. Li, N. Wang, J. Liu, and L. Xie, "NPU-HC Speaker Verification System for Far-field Speaker Verification Challenge," in Proc. Interspeech 2022, 2022.
- Y. Jiang, K. A. Lee, Z. Tang, B. Ma, A. Larcher, and H. Li, "PLDA modeling in i-vector and supervector space for speaker verification," in Proc. Interspeech 2012, pp. 1680–1683, 2012