Audiovisual, Gaze-controlled Auditory Attention Decoding Dataset KU Leuven (AV-GC-AAD)

Rotaru, Iustina; Geirnaert, Simon; Bertrand, Alexander; Francart, Tom

doi:10.5281/zenodo.11058711

Published April 24, 2024 | Version v1

Dataset Open

Audiovisual, Gaze-controlled Auditory Attention Decoding Dataset KU Leuven (AV-GC-AAD)

1. KU Leuven

Contributors

Contact persons:

Data collector:

Rotaru, Iustina¹

1. KU Leuven

This dataset is described in detail in the following journal paper [1]:
Rotaru, I., Geirnaert, S., Heintz, N., Van de Ryck, I., Bertrand, A., & Francart, T. (2024). What are we really decoding? Unveiling biases in EEG-based decoding of the spatial focus of auditory attention. Journal of Neural Engineering, 21(1), 016017.
https://iopscience.iop.org/article/10.1088/1741-2552/ad2214/meta

If using this dataset, please cite the original paper above and the current Zenodo repository.

Note from the authors: Recent evaluations reveal that various published AAD (Auditory Attention Decoding) algorithms do not achieve significant above-chance performance on this AV-GC-AAD dataset. This suggests that previously reported successes may have been largely influenced by eye gaze confounds present in other datasets. Despite these findings, poor performance on the AV-GC-AAD dataset is often dismissed, with reasons cited such as insufficient training data, high heterogeneity in audiovisual conditions, or the claim that participants were unable to focus their auditory attention due to the complexity of the instructions.

To address these concerns, we provide a supplementary technical report (and accompanying code), showcasing results from a simple linear stimulus reconstruction AAD algorithm applied to this dataset. Our findings demonstrate that high AAD accuracy can be achieved within individual conditions, and that the model generalizes across conditions, new subjects, and even across different datasets.

Report | Matlab Code

Through this report, we aim to remove any doubts that the AV-GC-AAD dataset's limitations are the primary cause of AAD algorithms failing to exceed chance-level performance. Additionally, this report and its accompanying code offer a simple baseline evaluation procedure, which can serve as a minimal benchmark for testing more advanced AAD algorithms on this dataset.

________________________________________________________________________________

Dataset description

This work was performed at ExpORL, Dept. Neurosciences, KU Leuven and Dept. Electrical Engineering (ESAT), KU Leuven (Belgium), with the goal of investigating and controlling for the effect of gaze during a competing listening task.

The full dataset contains EEG and EOG data collected from 16 normal-hearing subjects, during a competing listening task, where the subjects were instructed to focus on one of two competing speech signals. However, subjects 2, 5 and 6 were excluded from the online repository due to not consenting to sharing their data in a public database (cf. signed informed consents approved by KU Leuven Ethical Committee). EEG recordings were conducted in a soundproof, electromagnetically shielded room at ExpORL, KU Leuven. The BioSemi ActiveTwo system was used to record 64-channel EEG signals at 8196 Hz sample rate. Additionally, the participants' gaze movements were measured via 4 EOG (electrooculography) electrodes placed symmetrically around the eyes.

The audio signals were administered to each subject at 65 dB SPL through a pair of insert phones (Etymotic ER10). In some experimental trials, the video depicting the attended talker was also presented on the screen. The original presented speech and video stimuli (.wav and .mp4 files) are excluded from the dataset due to copyrights. However, the acoustic envelopes of the attended and unattended audio stimuli are calculated and included in the dataset (see below).
The experiments were conducted using custom-made Python scripts.

The experimental trials were split into 2 blocks. Each block consisted of the following sequence of conditions: MovingVideo, MovingTargetNoise, NoVisuals, StaticVideo. The auditory task was the same for all conditions: the subjects had to attend to one of the two presented talkers, as indicated by an arrow on the screen. The visual task differed across conditions:

MovingVideo: the subjects had to follow the moving video of the to-be-attended speaker presented on a randomized horizontal trajectory on the screen.
MovingTargetNoise: the subjects had to follow a moving cross-hair presented on a randomized horizontal trajectory on the screen.
NoVisuals: a black screen was presented and the subjects had to fixate on an imaginary point in the center of the screen while minimizing the eye movements.
StaticVideo: the subjects had to fixate the static video of the to-be-attended speaker presented on the same side with the audio stimulus of the attended speaker.

The full description of all experimental conditions can be consulted in [1].

Each trial/condition lasted for 10 minutes, with a spatial switch in attention after 5 minutes (i.e., the speech stimuli presented to the L and R insertphones were programmed to swap sides, such that after the switch the subjects kept listening to the same speaker, but coming from the opposite spatial location). To keep the subjects motivated, they had to answer one comprehension question related to the attended acoustic stimulus after each trial.

For each subject, there is a .mat file containing the following variables:
conditionID: the condition ID for each trial
data: the preprocessed EEG and EOG data for each trial (first 64 channels are EEG, last 4 are EOG)
fs: the sampling rate of the EEG, EOG and stimuli envelopes
initAttention: the spatial location of the initial attended stimulus for each trial
metadata: the original metadata (e.g. channel names, triggers) saved in the raw .bdf files for each trial
params: the filtering parameters used for each trial
randomization: the randomization parameters (e.g. presented stimuli, attention switch times etc.) for each trial
stimulus: the precalculated envelopes for the attended and unattended stimuli for each trial
subjID: the anonymised ID of the current subject

Preprocessing EEG and EOG

All the following preprocessing steps were applied per trial. The EEG was initially downsampled using an antialiasing filter from 8192 Hz to 256 Hz. The data was then filtered between 1–40 Hz using a zero-phase Chebyshev filter (type II, with 80 dB attenuation at 10% outside the passband). Finally, downsampling to 128 Hz was performed to speed up computation.

Speech envelopes extraction

The original speech signals at 44100 Hz were downsampled to 8192 Hz (to match the EEG sampling rate). They were then passed through a gammatone filterbank, which roughly approximates the spectral decomposition as performed by the human auditory system. Per subband, the audio envelopes were extracted, and their dynamic range was compressed using a power-law operation with exponent 0.6 (as proposed in [2]). Each subband was then bandpass-filtered with the same filter used for the EEG data. The resulting subband envelopes were then summed to construct a single broadband envelope. Finally, the envelope signals were downsampled to 128 Hz to match the sampling rate of the preprocessed EEG.

Notes

For subjects 1-3, 6 trials corresponding to 3 conditions (MovingVideo, NoVisuals, StaticVideo) were measured.
For subjects 4-16, 8 trials corresponding to 4 conditions (MovingVideo, MovingTargetNoise, NoVisuals, StaticVideo) were measured.
For subject 14, trial 2 from the StaticVideo condition was not recorded due to some technical problems.
In the dataset, 'FixedVideo' is the alias name for the 'StaticVideo' condition described in [1].
The EEG/EOG data was not referenced. Before further analysis, rereferencing the data (e.g., to an arbitrary EEG channel, or the common-average of all channels) is necessary to achieve a better common-mode rejection and thus increase the SNR of recorded data. (for details, see https://www.biosemi.com/faq/cms&drl.htm)

References

[1] Rotaru, Iustina, et al. "What are we really decoding? Unveiling biases in EEG-based decoding of the spatial focus of auditory attention." Journal of Neural Engineering 21.1 (2024): 016017.

[2] Biesmans, Wouter, et al. "Auditory-inspired speech envelope extraction methods for improved EEG-based auditory attention detection in a cocktail party scenario." IEEE Transactions on neural systems and rehabilitation engineering 25.5 (2016): 402-412.

Files

README.txt

Files (2.0 GB)

Name	Size	Download all
2024-AV-GC-AAD-sub01_preprocessed.mat md5:2145a5838217d3a51af4b3f62ee7ab47	123.6 MB	Download
2024-AV-GC-AAD-sub03_preprocessed.mat md5:79b71bce19cddef42f006ba7764e1403	123.6 MB	Download
2024-AV-GC-AAD-sub04_preprocessed.mat md5:755124839d300da7d71b1947927a020e	164.9 MB	Download
2024-AV-GC-AAD-sub07_preprocessed.mat md5:dbabf01f6c966a5892e1413114e0a0d3	164.8 MB	Download
2024-AV-GC-AAD-sub08_preprocessed.mat md5:fbf55ec5fad4057d9985064892f488f9	164.8 MB	Download
2024-AV-GC-AAD-sub09_preprocessed.mat md5:dc15f5069c74079332b6658cb781c2d8	164.9 MB	Download
2024-AV-GC-AAD-sub10_preprocessed.mat md5:0fa7f8d1b2c6919108ea911ccc00e432	164.9 MB	Download
2024-AV-GC-AAD-sub11_preprocessed.mat md5:9878079ef05386134589e7a499ae0186	164.9 MB	Download
2024-AV-GC-AAD-sub12_preprocessed.mat md5:8e5bb4b8c88f4277a88f09b79fe2de5e	164.8 MB	Download
2024-AV-GC-AAD-sub13_preprocessed.mat md5:40f10171c625ad3c8f51932e8f2e5ee0	164.9 MB	Download
2024-AV-GC-AAD-sub14_preprocessed.mat md5:0796dd6b9e772eed6ff5388791fe005d	144.2 MB	Download
2024-AV-GC-AAD-sub15_preprocessed.mat md5:4e3caea410d4257d2bd5fb52017b8b0a	164.9 MB	Download
2024-AV-GC-AAD-sub16_preprocessed.mat md5:7332a1c4cf3284f51c6a32eaeacb79df	162.5 MB	Download
README.txt md5:4c3ebeec1f8f0bce8e2fa120ddafe835	6.3 kB	Preview Download

Additional details

Is published in: Journal article: 10.1088/1741-2552/ad2214 (DOI)

Research Foundation - Flanders
SBO mandate 1S14922N
Research Foundation - Flanders
Junior postdoctoral fellowship 1242524N
Research Foundation - Flanders
FWO project G081722N
KU Leuven
Internal Funds KU Leuven IDN/23/006

[1] Rotaru, Iustina, et al. "What are we really decoding? Unveiling biases in EEG-based decoding of the spatial focus of auditory attention." Journal of Neural Engineering 21.1 (2024): 016017.
[2] Biesmans, Wouter, et al. "Auditory-inspired speech envelope extraction methods for improved EEG-based auditory attention detection in a cocktail party scenario." IEEE transactions on neural systems and rehabilitation engineering 25.5 (2016): 402-412.

	All versions	This version
Views	283	283
Downloads	602	602
Data volume	274.7 GB	274.7 GB

Audiovisual, Gaze-controlled Auditory Attention Decoding Dataset KU Leuven (AV-GC-AAD)

Contributors

Contact persons:

Data collector:

Files

README.txt

Files (2.0 GB)

Additional details

Related works

Funding

References

Audiovisual, Gaze-controlled Auditory Attention Decoding Dataset KU Leuven (AV-GC-AAD)

Creators

Contributors

Contact persons:

Data collector:

Description

Files

README.txt

Files (2.0 GB)

Additional details

Related works

Funding

References