Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published February 26, 2023 | Version 1
Dataset Open

I-MSV 2022: Indic-Multilingual and Multi-sensor Speaker Verification Challenge

  • 1. Department of Electrical Engineering Indian Institute of Technology Dharwad
  • 1. Govt. of India
  • 2. (IIIT Dharwad, Karnataka),
  • 3. (KLETech, Hubballi, Karnataka),
  • 4. (NIT Nagaland, Nagaland)
  • 5. (CDAC Kolkata, WB),
  • 6. (KLU Vijayawada, AP)
  • 7. (NIT Patna, Bihar)
  • 8. (IIT Dharwad, Karnataka)

Description

Dear Users,

Data is password protected, to get password all you need to do is register using below link. Note that data is free of Cost 

Click here for Registration

Speaker Verification (SV) is a task to verify the claimed identity of the claimant using his/her voice sample. Though there exists an ample amount of research in SV technologies, the development concerning a multilingual conversation is limited. In a country like India, almost all the speakers are polyglot in nature. Consequently, the development of a Multilingual SV (MSV) system on the data collected in the Indian scenario is more challenging. With this motivation, the Indic- Multilingual Speaker Verification (I-MSV) Challenge 2022 has been designed for understanding and comparing the state of-the-art SV techniques. For the challenge, approximately 100 hours of data spoken by 100 speakers has been collected using 5 different sensors in 13 Indian languages. The data is divided into development, training, and testing sets and has been made publicly available for further research. The goal of this challenge is to make the SV system robust to language and sensor variations between enrollment and testing. In the challenge, participants were asked to develop the SV system in two scenarios, viz. constrained and unconstrained. The best system in the constrained and unconstrained scenario achieved a performance of 2.12% and 0.26% in terms of Equal Error Rate (EER), respectively.

Files

Development data.zip

Files (19.6 GB)

Name Size Download all
md5:6143d53b903740f6d8cf8c6632d65926
11.5 GB Preview Download
md5:0a7cc503e75ae235de3d3b4930cb221a
856.0 MB Preview Download
md5:5f2c09bf8d5a4bd9510c7d0640fbe1f9
4.1 GB Preview Download
md5:a6a06d4787653b8f96129b1f7eb95b36
3.1 GB Preview Download

Additional details

Related works

Is documented by
Journal article: 10.48550/arXiv.2302.13209 (DOI)

References

  • Z. Bai and X.-L. Zhang, "Speaker recognition based on deep learning: An overview," Neural Networks, vol. 140, pp. 65–99, 2021
  • A. Khosravani and M. M. Homayounpour, "A plda approach for language and text independent speaker recognition," Computer Speech & Language, vol. 45, pp. 457–474, 2017.
  • B. C. Haris, G. Pradhan, A. Misra, S. Prasanna, R. K. Das, and R. Sinha, "Multivariability speaker recognition database in indian scenario," International Journal of Speech Technology, vol. 15, no. 4, pp. 441–453, 2012
  • J.-w. Jung, Y. J. Kim, H.-S. Heo, B.-J. Lee, Y. Kwon, and J. S. Chung, "Raw waveform speaker verification for supervised and self-supervised learning," arXiv preprint arXiv:2203.08488, 2022.
  • D. P. Vassileios Balntas, Edgar Riba and K. Mikolajczyk, "Learning local feature descriptors with triplets and shallow convolutional neural networks," in Proc. British Machine Vision Conf. (BMVC) (E. R. H. Richard C. Wilson and W. A. P. Smith, eds.), pp. 119.1–119.11, BMVA Press, September 2016.
  • J. Deng, J. Guo, N. Xue, and S. Zafeiriou, "Arcface: Additive angular margin loss for deep face recognition," in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 4690–4699, 2019.
  • M. Zhao, Y. Ma, Y. Ding, Y. Zheng, M. Liu, and M. Xu, "MultiQuery Multi-Head Attention Pooling and Inter-Topk Penalty for Speaker Verification," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Process. (ICASSP), pp. 6737–6741, IEEE, 2022.
  • H. S. Heo, B.-J. Lee, J. Huh, and J. S. Chung, "Clova baseline system for the voxceleb speaker recognition challenge 2020," arXiv preprint arXiv:2009.14153, 2020.
  • K. Okabe, T. Koshinaka, and K. Shinoda, "Attentive Statistics Pooling for Deep Speaker Embedding," in Proc. Interspeech 2018, pp. 2252– 2256, 2018.
  • J. S. Chung, J. Huh, S. Mun, M. Lee, H.-S. Heo, S. Choe, C. Ham, S. Jung, B.-J. Lee, and I. Han, "In Defence of Metric Learning for Speaker Recognition," in Proc. Interspeech 2020, pp. 2977–2981, 2020
  • B. Desplanques, J. Thienpondt, and K. Demuynck, "ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification," in Proc. Interspeech 2020, pp. 3830–3834, 2020
  • L. Zhang, Y. Li, N. Wang, J. Liu, and L. Xie, "NPU-HC Speaker Verification System for Far-field Speaker Verification Challenge," in Proc. Interspeech 2022, 2022.
  • Y. Jiang, K. A. Lee, Z. Tang, B. Ma, A. Larcher, and H. Li, "PLDA modeling in i-vector and supervector space for speaker verification," in Proc. Interspeech 2012, pp. 1680–1683, 2012