Impact of Multi-Classifier Fusion on Target Speaker Detection in Audio Streams
Authors/Creators
Description
This article discusses robust system by multi-classifier fusion approach used in target Speaker Detection (SD) systems to improve their performance. Single classifiers may introduce significant performance degradation in the performance. To overcome this problem, we propose in this work to apply the fusion of multi-classifiers Hierarchical Ascending Clustering (HAC), Gaussian Mixture Model (GMM) and Support Vector Machine (SVM) on an architecture based on Activity Detection Voice (VAD) in order to reduce errors of speakers’ detection. A comparative investigation was conducted between individual classifiers and their fusion; and for the evaluation task, the three classifiers and their fusion were tested on telephonic conversations extracted from the NIST-2005 corpus. The results of experiments have shown that the applied multi-classifier fusion on this architecture has considerably enhanced the performances of target SD system, comparing to the applied each classifier. The results show a Speaker Detection Rate (SDR) of 99.18% with the fusion approach, compared to HAC (85.98%), GMM (86.68%), and SVM (97.67%).
Files
25_Kenai.pdf
Files
(750.8 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:a99a679de2ce18072651b14c2529b571
|
750.8 kB | Preview Download |