Speaker Recognition using Multiple X-Vector Speaker Representations with Two-Stage Clustering and Outlier Detection Refinement
Creators
- 1. Intelligent Voice Ltd
- 2. University of East London
Description
This paper presents a novel Variational Bayes x-vector Voice Print Extraction (VBxVPE) system, capable of capturing vocal variations using multiple x-vector representations with two-stage clustering and outlier detection for robust speaker recognition and verification. The presented approach demonstrates beyond the state-of-the-art results when evaluated against the ‘core-core’ and ‘core-multi’ evaluation conditions of the Speakers In the Wild dataset, achieving an Equal Error Rate of 1.06%, Cost of Detection score of 0.052, minimum Cost of Detection score of 0.010, Speaker Identification Accuracy of 95.84% with Precision, Recall and F1 score values of 0.964, 0.958 and 0.961, respectively on the ‘core-core’ evaluation condition and Equal Error Rate of 1.07%, Cost of Detection score of 0.066, minimum Cost of Detection score of 0.010 with Precision, Recall and F1 score values of 0.967, 0.963 and 0.965, respectively on the ‘core-multi’ evaluation condition.
Files
2022135314_Roman.pdf
Files
(226.4 kB)
Name | Size | Download all |
---|---|---|
md5:5805d2c5ccfeb564a6880ef8ce26ff55
|
226.4 kB | Preview Download |