This dataset is described in detail in the following journal paper:
Rotaru, I., Geirnaert, S., Heintz, N., Van de Ryck, I., Bertrand, A., & Francart, T. (2024). What are we really decoding? Unveiling biases in EEG-based decoding of the spatial focus of auditory attention. Journal of Neural Engineering, 21(1), 016017.
https://iopscience.iop.org/article/10.1088/1741-2552/ad2214/meta

*** If using this dataset, please cite the original paper above and the current Zenodo repository (10.5281/zenodo.11058711). ***

This work was performed at ExpORL, Dept. Neurosciences, KU Leuven and Dept. Electrical Engineering (ESAT), KU Leuven (Belgium), with the goal of investigating and controlling for the effect of gaze during a competing listening task.

The full dataset contains EEG and EOG data collected from 16 normal-hearing subjects, during a competing listening task, where the subjects were instructed to focus on one of two competing speech signals. 
However, subjects 2, 5 and 6 were excluded from the online repository due to not consenting to sharing their data in a public database (cf. signed informed consents approved by KU Leuven Ethical Committee).
EEG recordings were conducted in a soundproof, electromagnetically shielded room at ExpORL, KU Leuven. 
The BioSemi ActiveTwo system was used to record 64-channel EEG signals at 8196 Hz sample rate. 
Additionally, the participants' gaze movements were measured via 4 EOG (electrooculography) electrodes placed symmetrically around the eyes. 
The audio signals were administered to each subject at 65 dB SPL through a pair of insert phones (Etymotic ER10). 
In some experimental trials, the video depicting the attended talker was also presented on the screen. 
The original presented speech and video stimuli (.wav and .mp4 files) are excluded from the dataset due to copyrights. However, the acoustic envelopes of the attended and unattended audio stimuli are calculated and included in the dataset (see below). 
The experiments were conducted using custom-made Python scripts.

The experimental trials were split into 2 blocks. Each block consisted of the following sequence of conditions: MovingVideo, MovingTargetNoise, NoVisuals, StaticVideo. 
The auditory task was the same for all conditions: the subjects had to attend to one of the two presented talkers, as indicated by an arrow on the screen. 
The visual task differed across conditions:
- MovingVideo: the subjects had to follow the moving video of the to-be-attended speaker presented on a randomized horizontal trajectory on the screen.
- MovingTargetNoise: the subjects had to follow a moving cross-hair presented on a randomized horizontal trajectory on the screen.
- NoVisuals: a black screen was presented and the subjects had to fixate on an imaginary point in the center of the screen while minimizing the eye movements.
- StaticVideo: the subjects had to fixate the static video of the to-be-attended speaker presented on the same side with the audio stimulus of the attended speaker.

The full description of all experimental conditions can be consulted in Rotaru et al 2024.

For each subject, there is a .mat file containing the following variables:
- conditionID: the condition ID for each trial 
- data: the preprocessed EEG and EOG data for each trial (first 64 channels are EEG, last 4 are EOG)
- fs: the sampling rate of the EEG, EOG and stimuli envelopes
- initAttention: the spatial location of the initial attended stimulus for each trial
- metadata: the original metadata (e.g. channel names, triggers) saved in the raw .bdf files for each trial
- params: the filtering parameters used for each trial
- randomization: the randomization parameters (e.g. presented stimuli, attention switch times) for each trial
- stimulus: the precalculated envelopes for the attended and unattended stimuli for each trial
- subjID: the anonymised ID of the current subject.

Each trial/condition lasted for 10 minutes, with a spatial switch in attention after 5 minutes (i.e., the speech stimuli presented to the L and R insertphones were programmed to swap sides, such that the subjects kept listening to the same speaker, but coming from the opposite spatial location). 
To keep the subjects motivated, they had to answer one comprehension question related to the attended acoustic stimulus after each trial.

Preprocessing EEG and EOG
All the following preprocessing steps were applied per trial. 
The EEG was initially downsampled using an antialiasing filter from 8192 Hz to 256 Hz. 
The data was then filtered between 1–40 Hz using a zero-phase Chebyshev filter (type II, with 80 dB attenuation at 10% outside the passband). 
Finally, downsampling to 128 Hz was performed to speed up computation.

Speech envelopes extraction
The original speech signals at 44100 Hz were downsampled to 8192 Hz (to match the EEG sampling rate). 
Then, they were passed through a gammatone filterbank, which roughly approximates the spectral decomposition as performed by the human auditory system.
Per subband, the audio envelopes were extracted, and their dynamic range was compressed using a power-law operation with exponent 0.6 (as proposed in Biesmans et al 2016).
Each subband was then bandpass filtered with the same filter as used in the EEG recordings. The resulting subband envelopes were then summed to construct a single broadband envelope. 
Finally, the envelope signals were downsampled to 128 Hz to match the sampling rate of the preprocessed EEG.

Notes
For subjects 1-3, 6 trials corresponding to 3 conditions (MovingVideo, NoVisuals, FixedVideo) were measured.
For subjects 4-16, 8 trials corresponding to 4 conditions (MovingVideo, MovingTargetNoise, NoVisuals, FixedVideo) were measured.
For subject 14, trial 2 from the StaticVideo condition was not recorded due to some technical problems.
In the dataset, 'FixedVideo' is the alias name for the 'StaticVideo' condition described in Rotaru et al 2024.
The EEG/EOG data was not referenced. Before further analysis, rereferencing the data (e.g., to an arbitrary EEG channel, or the common-average of all channels) is necessary to achieve a better common-mode rejection and thus increase the SNR of recorded data. (for details, see https://www.biosemi.com/faq/cms&drl.htm)