Published November 15, 2025 | Version v1
Publication Open

Impact of Multi-Classifier Fusion on Target Speaker Detection in Audio Streams

Description

This article discusses robust system by multi-classifier fusion approach used in target Speaker Detection (SD) systems to improve their performance. Single classifiers may introduce significant performance degradation in the performance. To overcome this problem, we propose in this work to apply the fusion of multi-classifiers Hierarchical Ascending Clustering (HAC), Gaussian Mixture Model (GMM) and Support Vector Machine (SVM) on an architecture based on Activity Detection Voice (VAD) in order to reduce errors of speakers’ detection. A comparative investigation was conducted between individual classifiers and their fusion; and for the evaluation task, the three classifiers and their fusion were tested on telephonic conversations extracted from the NIST-2005 corpus. The results of experiments have shown that the applied multi-classifier fusion on this architecture has considerably enhanced the performances of target SD system, comparing to the applied each classifier. The results show a Speaker Detection Rate (SDR) of 99.18% with the fusion approach, compared to HAC (85.98%), GMM (86.68%), and SVM (97.67%).

Files

25_Kenai.pdf

Files (750.8 kB)

Name Size Download all
md5:a99a679de2ce18072651b14c2529b571
750.8 kB Preview Download