Brain-Assisted Speech Enhancement Dataset and code USTC

Xu, Qingtian; Zhang, Jie; Sun, Miao; Liang, Huadong; Li, Xin; Zhang, Yubang; Ling, Zhenhua

doi:10.5281/zenodo.18366436

Published May 28, 2026 | Version v1

Dataset Open

Brain-Assisted Speech Enhancement Dataset and code USTC

1. University of Science and Technology of China
2. Guangzhou Maritime University
3. Artificial Intelligence Research Institute, iFLYTEK

Dataset Introduction

This dataset was released together with the paper “Qingtian Xu, Jie Zhang, Miao Sun, Huadong Liang, Xin Li, Zhenhua Ling, Analysis of Brain-Assisted Speech Enhancement Models Incorporating Auditory Attention Decoding, IEEE Trans. Audio, Speech, Lang. Process. (TASLP), 34:3073-3086, 2026.” The official implementation code is also publicly available.

If you use this dataset in your research, please cite the corresponding paper.

Overview

This dataset is designed for research on Auditory Attention Decoding (AAD) and Brain-Assisted Speech Enhancement (BASE) under dichotic listening conditions. The dataset contains synchronized EEG recordings and speech stimuli collected from normal-hearing subjects performing sustained auditory attention tasks.

A key characteristic of this dataset is that no identical speech pairs are repeated, which helps reduce shortcut learning caused by repeated stimulus combinations and enables more rigorous evaluation of AAD-driven BASE systems.

Dataset Description

The dataset includes 18 normal-hearing subjects aged between 20 and 35 years. During the experiment, each subject is instructed to attend to one of two competing speakers in a dichotic listening scenario.

Each subject participates in 20 trials, where:

Each trial lasts 120 seconds.
Subjects switch their attended speaker between two successive trials.
Two speech stimuli are simultaneously presented through earphones, one for each ear.
After each trial, subjects are required to answer multiple-choice comprehension questions to verify attention engagement.

The recordings are conducted in a quiet office environment. During data acquisition, the computer screen in front of the subjects remains plain and free of visual distractions or additional stimuli.

Speech Stimuli

The speech materials consist of news recordings from Xinwen Lianbo, spoken by four native Chinese speakers and sampled at 44.1 kHz.

The original speech recordings are randomly paired to construct two-speaker mixtures. For each subject, the presented speech pairs are randomly selected from combinations of two out of the four speakers.

Importantly:

No identical speech pairs appear in the dataset.
To guarantee this property, all trials are carefully checked.
Four potentially repeated trials are removed from the final release.

This design improves the reliability of generalization evaluation for AAD and BASE models.

EEG Recording

EEG signals are recorded using the BioSemi ActiveTwo system with 64-channel electrodes at a sampling rate of 8196 Hz.

For downstream AAD and BASE evaluation, the EEG data are further:

Downsampled to 128 Hz
Band-pass filtered between 0.1 Hz and 45 Hz
Processed with Independent Component Analysis (ICA) for artifact removal

File Structure

README.md
Dataset documentation and usage instructions.
group1.xlsx
Experimental paradigm records and trial metadata.
test.m
MATLAB preprocessing script for raw EEG recordings. The script generates .mat files used by the dataset generation pipeline.

Preprocessing Pipeline

The provided test.m script performs EEG preprocessing using EEGLAB.

The preprocessing steps include:

Loading Raw EEG Data
- Loads BioSemi .cdt files.
Resampling
- Downsamples EEG signals to 128 Hz.
Epoch Extraction
- Extracts 120-second epochs for each trial.
Band-pass Filtering
- Applies a 0.1–45 Hz band-pass filter.
Non-EEG Channel Removal
- Removes:
  - HEO
  - VEO
  - TRIGGER
  - EKG
  - EMG
Artifact Removal
- Performs ICA using runica (extended mode).
- Uses ICLabel for automatic component classification.
- Removes components labeled as:
  - Muscle
  - Eye
  - Channel Noise
- Components with probability greater than 0.7 are automatically rejected.
Re-referencing
- Applies average reference.

Excluded Trials

To ensure that no identical speech pairs exist in the released dataset, the following trials are excluded:

Subject 13 — Trial 16
Subject 14 — Trial 13
Subject 16 — Trial 4
Subject 17 — Trial 18

Important Notes for AAD-Driven BASE Research

For AAD-driven Brain-Assisted Speech Enhancement tasks, we strongly recommend avoiding EEG preprocessing methods based on frequency-band coupling.

Such operations may alter the original linear characteristics of EEG signals and potentially make the optimization of AAD modules significantly more difficult.

Citation

If you use this dataset, please cite the article:

@ARTICLE{11540442,
author={Xu, Qing-Tian and Zhang, Jie and Sun, Miao and Liang, Huadong and Li, Xin and Ling, Zhen-Hua},
journal={IEEE Transactions on Audio, Speech and Language Processing},
title={Analysis of Brain-Assisted Speech Enhancement Models Incorporating Auditory Attention Decoding},
year={2026},
volume={34},
number={},
pages={3073-3086}
}

Files

BASE_AAD_code.zip

Files (14.9 GB)

Name	Size
BASE_AAD_code.zip md5:5b38da3f7b5ae24a55d8b4ef82a6cc7c	758.1 kB	Preview Download
USTCdataset.zip md5:e068243795fb93afdb3439a32a046922	14.9 GB	Preview Download

	All versions	This version
Views	38	38
Downloads	40	40
Data volume	401.7 GB	401.7 GB

Brain-Assisted Speech Enhancement Dataset and code USTC

Authors/Creators

Description

Dataset Introduction

Overview

Dataset Description

Speech Stimuli

EEG Recording

File Structure

Preprocessing Pipeline

Excluded Trials

Important Notes for AAD-Driven BASE Research

Citation

Files

BASE_AAD_code.zip

Files (14.9 GB)