Published May 24, 2022 | Version 0
Preprint Open

Optimal time lags for linear cortical auditory attention detection: differences between speech and music listening

  • 1. Department of Electronic Systems, Aalborg University, Denmark; Bang & Olufsen, Denmark
  • 2. Department of Electronic Systems, Aalborg University, Denmark
  • 3. Bang & Olufsen, Denmark; Department of Electronic Systems, Aalborg University, Denmark
  • 4. Bionic Institute, Australia; Department of Clinical Medicine, Aalborg University, Denmark
  • 1. Lyon Neuroscience Research Center, CNRS UMR5292, Inserm U1028, Université Claude Bernard Lyon 1, Université Jean Monnet Saint-Étienne, Lyon, France
  • 2. ENTPE, Laboratoire Génie Civil et Bâtiment, Vaulx-en-Velin, France
  • 3. Starkey France, Créteil, France


In recent decades, there has been a lot of interest in detecting auditory attention from brain signals. Cortical recordings have been demonstrated to be useful in determining which speaker a person is listening to a mixed variety of sounds ( the cocktail party effect). Linear regression, often called the stimulus reconstruction method, shows that the envelope of the sounds heard can be reconstructed from continuous electroencephalogram recordings (EEG). The target sound, to which the listener is paying attention, can be reconstructed to a greater extent compared to other sounds present in the sound scene, which can allow attention decoding. Reconstruction can be obtained with EEG signals that are delayed compared to the audio signal, to consider the time for neural processing. It can be used to identify latencies where the reconstruction is optimal, which reflects a cortical process specific to the type of audio heard. However, most of these studies used only speech signals and did not investigate other types of auditory stimuli, such as music.

In the present study, we applied this stimulus reconstruction method to decode auditory attention in a cocktail party scenario that includes both speech and music. Participants were presented with a target sound (either speech or music) and a distractor sound (either speech or music) while continuously recording their cortical response during the listening with a 64-channels EEG system. From these recordings, we reconstructed the envelope of the stimuli, both target and distractor, by using linear ridge regression decoding models at individual time lags. Results showed different time lags for maximal reconstruction accuracies between music and speech listening, suggesting separate underlying cortical processes. Results also suggest that an attentional aspect can influence the reconstruction accuracy for middle/late time-lags.


  • Innovations Fonden: 9065-00270B



Files (2.7 MB)

Name Size Download all
2.7 MB Preview Download