PHE-Net: Envelope-Guided Speaker Extraction with Unlimited Speaker Scalability via WavLM-Based Discovery

waris, dariush

doi:10.5281/zenodo.19675768

Published April 21, 2026 | Version 1.0

Preprint Open

PHE-Net: Envelope-Guided Speaker Extraction with Unlimited Speaker Scalability via WavLM-Based Discovery

waris, dariush

We present PHE-Net, a modular voice extraction system that separates individual speakers from single-channel mixtures of 2 to 20 simultaneous talkers. The system achieves +18.27 dB SI-SNRi with oracle guidance, scaling from N=2 to N=20 with zero degradation. In fully blind evaluation, +8.20 dB SI-SNRi at N=10 speakers with no enrollment audio. Through systematic ablation, we discover that the spectral envelope channel alone determines extraction quality — speaker embeddings are provably ignored (cosine 0.50 = cosine 1.00), and F0 pitch contributes nothing when envelope is sufficient (zero-F0 ceiling = +16.25 dB at N=10). This finding simplifies the research problem to a single well-defined challenge: improving blind spectral envelope estimation from multi-speaker mixtures.

Files

blind_N10_mix.wav

Files (669.2 kB)

Name	Size	Download all
blind_N10_mix.wav md5:c0698eaca8aec782835cb7c44b0d8b4c	96.0 kB	Preview Download
blind_N10_spk10_iter0.wav md5:46d3534b64f256824a09e12a6066c47c	96.0 kB	Preview Download
blind_N10_spk10_iter1.wav md5:fcd490861db6d73798653ce0ef7e468f	96.0 kB	Preview Download
oracle_mix.wav md5:d8714c560b7342ce66678747bee0c58e	160.0 kB	Preview Download
oracle_sep_spk10.wav md5:d2802bb0ab178443b8437d3bf77a891c	160.0 kB	Preview Download
PHE-Net_Research_Paper.pdf md5:640f9407a9eddae06d62d8b492ba23fe	60.5 kB	Preview Download
README.txt md5:7679b76dc20175d0f844237c98c961b4	465 Bytes	Preview Download

	All versions	This version
Views	65	65
Downloads	54	54
Data volume	6.3 MB	6.3 MB

PHE-Net: Envelope-Guided Speaker Extraction with Unlimited Speaker Scalability via WavLM-Based Discovery

Authors/Creators

Description

Files

blind_N10_mix.wav

Files (669.2 kB)