Brain-Assisted Speech Enhancement Dataset and code USTC
Authors/Creators
Description
Dataset Introduction
This dataset was released together with the paper “Qingtian Xu, Jie Zhang, Miao Sun, Huadong Liang, Xin Li, Zhenhua Ling, Analysis of Brain-Assisted Speech Enhancement Models Incorporating Auditory Attention Decoding, IEEE Trans. Audio, Speech, Lang. Process. (TASLP), 34:3073-3086, 2026.” The official implementation code is also publicly available.
If you use this dataset in your research, please cite the corresponding paper.
Overview
This dataset is designed for research on Auditory Attention Decoding (AAD) and Brain-Assisted Speech Enhancement (BASE) under dichotic listening conditions. The dataset contains synchronized EEG recordings and speech stimuli collected from normal-hearing subjects performing sustained auditory attention tasks.
A key characteristic of this dataset is that no identical speech pairs are repeated, which helps reduce shortcut learning caused by repeated stimulus combinations and enables more rigorous evaluation of AAD-driven BASE systems.
Dataset Description
The dataset includes 18 normal-hearing subjects aged between 20 and 35 years. During the experiment, each subject is instructed to attend to one of two competing speakers in a dichotic listening scenario.
Each subject participates in 20 trials, where:
-
Each trial lasts 120 seconds.
-
Subjects switch their attended speaker between two successive trials.
-
Two speech stimuli are simultaneously presented through earphones, one for each ear.
-
After each trial, subjects are required to answer multiple-choice comprehension questions to verify attention engagement.
The recordings are conducted in a quiet office environment. During data acquisition, the computer screen in front of the subjects remains plain and free of visual distractions or additional stimuli.
Speech Stimuli
The speech materials consist of news recordings from Xinwen Lianbo, spoken by four native Chinese speakers and sampled at 44.1 kHz.
The original speech recordings are randomly paired to construct two-speaker mixtures. For each subject, the presented speech pairs are randomly selected from combinations of two out of the four speakers.
Importantly:
-
No identical speech pairs appear in the dataset.
-
To guarantee this property, all trials are carefully checked.
-
Four potentially repeated trials are removed from the final release.
This design improves the reliability of generalization evaluation for AAD and BASE models.
EEG Recording
EEG signals are recorded using the BioSemi ActiveTwo system with 64-channel electrodes at a sampling rate of 8196 Hz.
For downstream AAD and BASE evaluation, the EEG data are further:
-
Downsampled to 128 Hz
-
Band-pass filtered between 0.1 Hz and 45 Hz
-
Processed with Independent Component Analysis (ICA) for artifact removal
File Structure
-
README.md
Dataset documentation and usage instructions. -
group1.xlsx
Experimental paradigm records and trial metadata. -
test.m
MATLAB preprocessing script for raw EEG recordings. The script generates.matfiles used by the dataset generation pipeline.
Preprocessing Pipeline
The provided test.m script performs EEG preprocessing using EEGLAB.
The preprocessing steps include:
-
Loading Raw EEG Data
-
Loads BioSemi
.cdtfiles.
-
-
Resampling
-
Downsamples EEG signals to 128 Hz.
-
-
Epoch Extraction
-
Extracts 120-second epochs for each trial.
-
-
Band-pass Filtering
-
Applies a 0.1–45 Hz band-pass filter.
-
-
Non-EEG Channel Removal
-
Removes:
-
HEO -
VEO -
TRIGGER -
EKG -
EMG
-
-
-
Artifact Removal
-
Performs ICA using
runica(extended mode). -
Uses
ICLabelfor automatic component classification. -
Removes components labeled as:
-
Muscle
-
Eye
-
Channel Noise
-
-
Components with probability greater than 0.7 are automatically rejected.
-
-
Re-referencing
-
Applies average reference.
-
Excluded Trials
To ensure that no identical speech pairs exist in the released dataset, the following trials are excluded:
-
Subject 13 — Trial 16
-
Subject 14 — Trial 13
-
Subject 16 — Trial 4
-
Subject 17 — Trial 18
Important Notes for AAD-Driven BASE Research
For AAD-driven Brain-Assisted Speech Enhancement tasks, we strongly recommend avoiding EEG preprocessing methods based on frequency-band coupling.
Such operations may alter the original linear characteristics of EEG signals and potentially make the optimization of AAD modules significantly more difficult.
Citation
If you use this dataset, please cite the article:
@ARTICLE{11540442,
author={Xu, Qing-Tian and Zhang, Jie and Sun, Miao and Liang, Huadong and Li, Xin and Ling, Zhen-Hua},
journal={IEEE Transactions on Audio, Speech and Language Processing},
title={Analysis of Brain-Assisted Speech Enhancement Models Incorporating Auditory Attention Decoding},
year={2026},
volume={34},
number={},
pages={3073-3086}
}