Published April 24, 2024 | Version v1
Dataset Open

Audiovisual, Gaze-controlled Auditory Attention Decoding Dataset KU Leuven (AV-GC-AAD)

Description

This dataset is described in detail in the following journal paper [1]:
Rotaru, I., Geirnaert, S., Heintz, N., Van de Ryck, I., Bertrand, A., & Francart, T. (2024). What are we really decoding? Unveiling biases in EEG-based decoding of the spatial focus of auditory attention. Journal of Neural Engineering, 21(1), 016017.
https://iopscience.iop.org/article/10.1088/1741-2552/ad2214/meta

*** If using this dataset, please cite the original paper above and the current Zenodo repository. ***

________________________________________________________________________________

Dataset description

This work was performed at ExpORL, Dept. Neurosciences, KU Leuven and Dept. Electrical Engineering (ESAT), KU Leuven (Belgium), with the goal of investigating and controlling for the effect of gaze during a competing listening task.

The full dataset contains EEG and EOG data collected from 16 normal-hearing subjects, during a competing listening task, where the subjects were instructed to focus on one of two competing speech signals. However, subjects 2, 5 and 6 were excluded from the online repository due to not consenting to sharing their data in a public database (cf. signed informed consents approved by KU Leuven Ethical Committee). EEG recordings were conducted in a soundproof, electromagnetically shielded room at ExpORL, KU Leuven. The BioSemi ActiveTwo system was used to record 64-channel EEG signals at 8196 Hz sample rate. Additionally, the participants' gaze movements were measured via 4 EOG (electrooculography) electrodes placed symmetrically around the eyes. 

The audio signals were administered to each subject at 65 dB SPL through a pair of insert phones (Etymotic ER10). In some experimental trials, the video depicting the attended talker was also presented on the screen. The original presented speech and video stimuli (.wav and .mp4 files) are excluded from the dataset due to copyrights. However, the acoustic envelopes of the attended and unattended audio stimuli are calculated and included in the dataset (see below). 
The experiments were conducted using custom-made Python scripts.

The experimental trials were split into 2 blocks. Each block consisted of the following sequence of conditions: MovingVideo, MovingTargetNoise, NoVisuals, StaticVideo. The auditory task was the same for all conditions: the subjects had to attend to one of the two presented talkers, as indicated by an arrow on the screen. The visual task differed across conditions:

  • MovingVideo: the subjects had to follow the moving video of the to-be-attended speaker presented on a randomized horizontal trajectory on the screen.
  • MovingTargetNoise: the subjects had to follow a moving cross-hair presented on a randomized horizontal trajectory on the screen.
  • NoVisuals: a black screen was presented and the subjects had to fixate on an imaginary point in the center of the screen while minimizing the eye movements.
  • StaticVideo: the subjects had to fixate the static video of the to-be-attended speaker presented on the same side with the audio stimulus of the attended speaker.

The full description of all experimental conditions can be consulted in [1].

Each trial/condition lasted for 10 minutes, with a spatial switch in attention after 5 minutes (i.e., the speech stimuli presented to the L and R insertphones were programmed to swap sides, such that after the switch the subjects kept listening to the same speaker, but coming from the opposite spatial location). To keep the subjects motivated, they had to answer one comprehension question related to the attended acoustic stimulus after each trial.

For each subject, there is a .mat file containing the following variables:
conditionID: the condition ID for each trial 
data: the preprocessed EEG and EOG data for each trial (first 64 channels are EEG, last 4 are EOG)
fs: the sampling rate of the EEG, EOG and stimuli envelopes
initAttention: the spatial location of the initial attended stimulus for each trial
metadata: the original metadata (e.g. channel names, triggers) saved in the raw .bdf files for each trial
params: the filtering parameters used for each trial
randomization: the randomization parameters (e.g. presented stimuli, attention switch times etc.) for each trial
stimulus: the precalculated envelopes for the attended and unattended stimuli for each trial
subjID: the anonymised ID of the current subject

Preprocessing EEG and EOG

All the following preprocessing steps were applied per trial. The EEG was initially downsampled using an antialiasing filter from 8192 Hz to 256 Hz. The data was then filtered between 1–40 Hz using a zero-phase Chebyshev filter (type II, with 80 dB attenuation at 10% outside the passband). Finally, downsampling to 128 Hz was performed to speed up computation.

Speech envelopes extraction

The original speech signals at 44100 Hz were downsampled to 8192 Hz (to match the EEG sampling rate). They were then passed through a gammatone filterbank, which roughly approximates the spectral decomposition as performed by the human auditory system. Per subband, the audio envelopes were extracted, and their dynamic range was compressed using a power-law operation with exponent 0.6 (as proposed in [2]). Each subband was then bandpass-filtered with the same filter used for the EEG data. The resulting subband envelopes were then summed to construct a single broadband envelope. Finally, the envelope signals were downsampled to 128 Hz to match the sampling rate of the preprocessed EEG.

Notes

  1. For subjects 1-3, 6 trials corresponding to 3 conditions (MovingVideo, NoVisuals, StaticVideo) were measured.
  2. For subjects 4-16, 8 trials corresponding to 4 conditions (MovingVideo, MovingTargetNoise, NoVisuals, StaticVideo) were measured.
  3. For subject 14, trial 2 from the StaticVideo condition was not recorded due to some technical problems.
  4. In the dataset, 'FixedVideo' is the alias name for the 'StaticVideo' condition described in [1].
  5. The EEG/EOG data was not referenced. Before further analysis, rereferencing the data (e.g., to an arbitrary EEG channel, or the common-average of all channels) is necessary to achieve a better common-mode rejection and thus increase the SNR of recorded data. (for details, see https://www.biosemi.com/faq/cms&drl.htm)

References

[1] Rotaru, Iustina, et al. "What are we really decoding? Unveiling biases in EEG-based decoding of the spatial focus of auditory attention." Journal of Neural Engineering 21.1 (2024): 016017.

[2] Biesmans, Wouter, et al. "Auditory-inspired speech envelope extraction methods for improved EEG-based auditory attention detection in a cocktail party scenario." IEEE Transactions on neural systems and rehabilitation engineering 25.5 (2016): 402-412.

Files

README.txt

Files (2.0 GB)

Name Size Download all
md5:2145a5838217d3a51af4b3f62ee7ab47
123.6 MB Download
md5:79b71bce19cddef42f006ba7764e1403
123.6 MB Download
md5:755124839d300da7d71b1947927a020e
164.9 MB Download
md5:dbabf01f6c966a5892e1413114e0a0d3
164.8 MB Download
md5:fbf55ec5fad4057d9985064892f488f9
164.8 MB Download
md5:dc15f5069c74079332b6658cb781c2d8
164.9 MB Download
md5:0fa7f8d1b2c6919108ea911ccc00e432
164.9 MB Download
md5:9878079ef05386134589e7a499ae0186
164.9 MB Download
md5:8e5bb4b8c88f4277a88f09b79fe2de5e
164.8 MB Download
md5:40f10171c625ad3c8f51932e8f2e5ee0
164.9 MB Download
md5:0796dd6b9e772eed6ff5388791fe005d
144.2 MB Download
md5:4e3caea410d4257d2bd5fb52017b8b0a
164.9 MB Download
md5:7332a1c4cf3284f51c6a32eaeacb79df
162.5 MB Download
md5:4c3ebeec1f8f0bce8e2fa120ddafe835
6.3 kB Preview Download

Additional details

Related works

Is published in
Journal article: 10.1088/1741-2552/ad2214 (DOI)

Funding

SBO mandate 1S14922N
Research Foundation - Flanders
Junior postdoctoral fellowship 1242524N
Research Foundation - Flanders
FWO project G081722N
Research Foundation - Flanders
Internal Funds KU Leuven IDN/23/006
KU Leuven

References

  • [1] Rotaru, Iustina, et al. "What are we really decoding? Unveiling biases in EEG-based decoding of the spatial focus of auditory attention." Journal of Neural Engineering 21.1 (2024): 016017.
  • [2] Biesmans, Wouter, et al. "Auditory-inspired speech envelope extraction methods for improved EEG-based auditory attention detection in a cocktail party scenario." IEEE transactions on neural systems and rehabilitation engineering 25.5 (2016): 402-412.