Audiovisual, Gaze-controlled Auditory Attention Decoding Dataset KU Leuven (AV-GC-AAD)
Creators
Contributors
Contact persons:
Data collector:
Description
This dataset is described in detail in the following journal paper [1]:
Rotaru, I., Geirnaert, S., Heintz, N., Van de Ryck, I., Bertrand, A., & Francart, T. (2024). What are we really decoding? Unveiling biases in EEG-based decoding of the spatial focus of auditory attention. Journal of Neural Engineering, 21(1), 016017.
https://iopscience.iop.org/article/10.1088/1741-2552/ad2214/meta
*** If using this dataset, please cite the original paper above and the current Zenodo repository. ***
________________________________________________________________________________
Dataset description
This work was performed at ExpORL, Dept. Neurosciences, KU Leuven and Dept. Electrical Engineering (ESAT), KU Leuven (Belgium), with the goal of investigating and controlling for the effect of gaze during a competing listening task.
The full dataset contains EEG and EOG data collected from 16 normal-hearing subjects, during a competing listening task, where the subjects were instructed to focus on one of two competing speech signals. However, subjects 2, 5 and 6 were excluded from the online repository due to not consenting to sharing their data in a public database (cf. signed informed consents approved by KU Leuven Ethical Committee). EEG recordings were conducted in a soundproof, electromagnetically shielded room at ExpORL, KU Leuven. The BioSemi ActiveTwo system was used to record 64-channel EEG signals at 8196 Hz sample rate. Additionally, the participants' gaze movements were measured via 4 EOG (electrooculography) electrodes placed symmetrically around the eyes.
The audio signals were administered to each subject at 65 dB SPL through a pair of insert phones (Etymotic ER10). In some experimental trials, the video depicting the attended talker was also presented on the screen. The original presented speech and video stimuli (.wav and .mp4 files) are excluded from the dataset due to copyrights. However, the acoustic envelopes of the attended and unattended audio stimuli are calculated and included in the dataset (see below).
The experiments were conducted using custom-made Python scripts.
The experimental trials were split into 2 blocks. Each block consisted of the following sequence of conditions: MovingVideo, MovingTargetNoise, NoVisuals, StaticVideo. The auditory task was the same for all conditions: the subjects had to attend to one of the two presented talkers, as indicated by an arrow on the screen. The visual task differed across conditions:
- MovingVideo: the subjects had to follow the moving video of the to-be-attended speaker presented on a randomized horizontal trajectory on the screen.
- MovingTargetNoise: the subjects had to follow a moving cross-hair presented on a randomized horizontal trajectory on the screen.
- NoVisuals: a black screen was presented and the subjects had to fixate on an imaginary point in the center of the screen while minimizing the eye movements.
- StaticVideo: the subjects had to fixate the static video of the to-be-attended speaker presented on the same side with the audio stimulus of the attended speaker.
The full description of all experimental conditions can be consulted in [1].
Each trial/condition lasted for 10 minutes, with a spatial switch in attention after 5 minutes (i.e., the speech stimuli presented to the L and R insertphones were programmed to swap sides, such that after the switch the subjects kept listening to the same speaker, but coming from the opposite spatial location). To keep the subjects motivated, they had to answer one comprehension question related to the attended acoustic stimulus after each trial.
For each subject, there is a .mat file containing the following variables:
conditionID: the condition ID for each trial
data: the preprocessed EEG and EOG data for each trial (first 64 channels are EEG, last 4 are EOG)
fs: the sampling rate of the EEG, EOG and stimuli envelopes
initAttention: the spatial location of the initial attended stimulus for each trial
metadata: the original metadata (e.g. channel names, triggers) saved in the raw .bdf files for each trial
params: the filtering parameters used for each trial
randomization: the randomization parameters (e.g. presented stimuli, attention switch times etc.) for each trial
stimulus: the precalculated envelopes for the attended and unattended stimuli for each trial
subjID: the anonymised ID of the current subject
Preprocessing EEG and EOG
All the following preprocessing steps were applied per trial. The EEG was initially downsampled using an antialiasing filter from 8192 Hz to 256 Hz. The data was then filtered between 1–40 Hz using a zero-phase Chebyshev filter (type II, with 80 dB attenuation at 10% outside the passband). Finally, downsampling to 128 Hz was performed to speed up computation.
Speech envelopes extraction
The original speech signals at 44100 Hz were downsampled to 8192 Hz (to match the EEG sampling rate). They were then passed through a gammatone filterbank, which roughly approximates the spectral decomposition as performed by the human auditory system. Per subband, the audio envelopes were extracted, and their dynamic range was compressed using a power-law operation with exponent 0.6 (as proposed in [2]). Each subband was then bandpass-filtered with the same filter used for the EEG data. The resulting subband envelopes were then summed to construct a single broadband envelope. Finally, the envelope signals were downsampled to 128 Hz to match the sampling rate of the preprocessed EEG.
Notes
- For subjects 1-3, 6 trials corresponding to 3 conditions (MovingVideo, NoVisuals, StaticVideo) were measured.
- For subjects 4-16, 8 trials corresponding to 4 conditions (MovingVideo, MovingTargetNoise, NoVisuals, StaticVideo) were measured.
- For subject 14, trial 2 from the StaticVideo condition was not recorded due to some technical problems.
- In the dataset, 'FixedVideo' is the alias name for the 'StaticVideo' condition described in [1].
- The EEG/EOG data was not referenced. Before further analysis, rereferencing the data (e.g., to an arbitrary EEG channel, or the common-average of all channels) is necessary to achieve a better common-mode rejection and thus increase the SNR of recorded data. (for details, see https://www.biosemi.com/faq/cms&drl.htm)
References
[1] Rotaru, Iustina, et al. "What are we really decoding? Unveiling biases in EEG-based decoding of the spatial focus of auditory attention." Journal of Neural Engineering 21.1 (2024): 016017.
[2] Biesmans, Wouter, et al. "Auditory-inspired speech envelope extraction methods for improved EEG-based auditory attention detection in a cocktail party scenario." IEEE Transactions on neural systems and rehabilitation engineering 25.5 (2016): 402-412.
Files
README.txt
Files
(2.0 GB)
Name | Size | Download all |
---|---|---|
md5:2145a5838217d3a51af4b3f62ee7ab47
|
123.6 MB | Download |
md5:79b71bce19cddef42f006ba7764e1403
|
123.6 MB | Download |
md5:755124839d300da7d71b1947927a020e
|
164.9 MB | Download |
md5:dbabf01f6c966a5892e1413114e0a0d3
|
164.8 MB | Download |
md5:fbf55ec5fad4057d9985064892f488f9
|
164.8 MB | Download |
md5:dc15f5069c74079332b6658cb781c2d8
|
164.9 MB | Download |
md5:0fa7f8d1b2c6919108ea911ccc00e432
|
164.9 MB | Download |
md5:9878079ef05386134589e7a499ae0186
|
164.9 MB | Download |
md5:8e5bb4b8c88f4277a88f09b79fe2de5e
|
164.8 MB | Download |
md5:40f10171c625ad3c8f51932e8f2e5ee0
|
164.9 MB | Download |
md5:0796dd6b9e772eed6ff5388791fe005d
|
144.2 MB | Download |
md5:4e3caea410d4257d2bd5fb52017b8b0a
|
164.9 MB | Download |
md5:7332a1c4cf3284f51c6a32eaeacb79df
|
162.5 MB | Download |
md5:4c3ebeec1f8f0bce8e2fa120ddafe835
|
6.3 kB | Preview Download |
Additional details
Related works
- Is published in
- Journal article: 10.1088/1741-2552/ad2214 (DOI)
Funding
- SBO mandate 1S14922N
- Research Foundation - Flanders
- Junior postdoctoral fellowship 1242524N
- Research Foundation - Flanders
- FWO project G081722N
- Research Foundation - Flanders
- Internal Funds KU Leuven IDN/23/006
- KU Leuven
References
- [1] Rotaru, Iustina, et al. "What are we really decoding? Unveiling biases in EEG-based decoding of the spatial focus of auditory attention." Journal of Neural Engineering 21.1 (2024): 016017.
- [2] Biesmans, Wouter, et al. "Auditory-inspired speech envelope extraction methods for improved EEG-based auditory attention detection in a cocktail party scenario." IEEE transactions on neural systems and rehabilitation engineering 25.5 (2016): 402-412.