Speaker Recognition using Multiple X-Vector Speaker Representations with Two-Stage Clustering and Outlier Detection Refinement

Shrestha, Roman; Glackin, Cornelius; Wall, Julie; Cannings, Nigel; Rajwadi, Marvin; Kada, Satya; Laird, James; Laird, Thea; Woodruff, Chris

doi:10.5281/zenodo.7017135

Published July 27, 2022 | Version preprint

Conference paper Open

Speaker Recognition using Multiple X-Vector Speaker Representations with Two-Stage Clustering and Outlier Detection Refinement

1. Intelligent Voice Ltd
2. University of East London

This paper presents a novel Variational Bayes x-vector Voice Print Extraction (VBxVPE) system, capable of capturing vocal variations using multiple x-vector representations with two-stage clustering and outlier detection for robust speaker recognition and verification. The presented approach demonstrates beyond the state-of-the-art results when evaluated against the ‘core-core’ and ‘core-multi’ evaluation conditions of the Speakers In the Wild dataset, achieving an Equal Error Rate of 1.06%, Cost of Detection score of 0.052, minimum Cost of Detection score of 0.010, Speaker Identification Accuracy of 95.84% with Precision, Recall and F1 score values of 0.964, 0.958 and 0.961, respectively on the ‘core-core’ evaluation condition and Equal Error Rate of 1.07%, Cost of Detection score of 0.066, minimum Cost of Detection score of 0.010 with Precision, Recall and F1 score values of 0.967, 0.963 and 0.965, respectively on the ‘core-multi’ evaluation condition.

Files

2022135314_Roman.pdf

Files (226.4 kB)

Name	Size	Download all
2022135314_Roman.pdf md5:5805d2c5ccfeb564a6880ef8ce26ff55	226.4 kB	Preview Download

Additional details

European Commission
MENHIR – Mental health monitoring through interactive conversations 823907

374

Views

184

Downloads

Show more details

	All versions	This version
Views	374	373
Downloads	184	184
Data volume	43.9 MB	43.9 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

Zenodo

Conference

Social and Biometric data for Applications in human machine interactions: Models and algorithms (SOBIOAPPS) In Conjunction with the 7th IEEE Cyber Science and Technology Congress (CyberSciTech 2022) (SOBIOAPPS, CyberSciTech 2022) , Calabria, Italy, 12 September 2022 (Session SOBIOAPPS)

Languages

English

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: August 23, 2022
Modified: July 16, 2024

Speaker Recognition using Multiple X-Vector Speaker Representations with Two-Stage Clustering and Outlier Detection Refinement

Creators

Description

Files

2022135314_Roman.pdf

Files (226.4 kB)

Additional details

Funding