MEG Audiovisual Matrix-Sentence Dataset (using Natural, Avatar, Degraded and Still-Image Version of the Speaker)

Riegel, Jasmin; Schüller, Alina; Reichenbach, Tobias

doi:10.5281/zenodo.18258659

Published January 15, 2026 | Version v1

Dataset Open

MEG Audiovisual Matrix-Sentence Dataset (using Natural, Avatar, Degraded and Still-Image Version of the Speaker)

1. Friedrich-Alexander-University Erlangen-Nürnberg
2. Friedrich-Alexander-Universität Erlangen-Nürnberg

Audiovisual Stimuli

We used German 5-word Matrix Sentences derived from the Oldenburger Satztest. We video-recorded a speaker speaking 571 (originally 600, after checking for pronunciation and disturbing sounds, 571 were left) different randomly generated sentences (frame rate: 29.97 fps, audio sampling rate: 44.1 kHz).

We then processed the video clips into 4 different audiovisual conditions:

Natural Video: The natural video of the speaker.

Degraded Video: A visually degraded version of the videos using FFmpeg’s edgedetect filter:

ffmpeg -i input.mp4 -vf "edgedetect=low=0.1:high=0.4" output.mp4

Avatar: An avatar of the speaker generated using software from the company D-iD, which used a CNN-based image encoder to process a still image of the talker and a GAN image-to-video model to animate lip movements in sync with the input audio (https://www.d-id.com/).

Still Image: A still image of the speaker combined with the audio track.

Experimental Design

Each participant took part in two measurement sessions. In both sessions, sentences with different visual stimuli were presented with a four-talker babbling noise at -4dB SNR. After each audiovisual sentence, the participants repeated what they had understood. After each visual-only sentence, the participants repeated the name they had lip-read. The sessions were structured as follows:

Session 1:

SRT50 measurement with 80 audio-only sentences. (Data not included due to storage limitations --> available upon request)

1. Audiovisual block: three random sentences of each av-condition in random order --> 12 sentences

1. Visual-only block: three random sentences of each v-only condition in random order --> 9 sentences

2. Audiovisual block

2. Visual-only block

…

8. Audiovisual block

8. Visual-only block

Session 2:

1. Audiovisual block

1. Visual-only block

…

12. Audiovisual block

12. Visual-only block

MEG and Behavioral Data Structure

MEG data of 32 participants is contained in this data set. Each participant has a directory “participants/px”. In the participant folder, you can find a "px_overview.csv" file and a folder with all the meg data “participants/px/meg_data”.

The overview file contains the sentence presentation order and the behavioral data. It is structured as follows:

Column: numbers the sentences in the order they were presented in the measurement sessions (i). Sentences i = 0 – 167 were presented in session 1. Sentences i = 168 – 419 were presented in Session 2.
Column: provides the visual condition of the presented sentence (natural, degraded, avatar, still_image).
Column: lists the sentence id (1 – 571) of the presented sentence (corresponding to the id in the stimuli directories).
Column: is 1 if the audio was audible for the participant (audiovisual stimuli) and 0 if it wasn’t (visual-only stimuli).
Column: is 1 if the presented name was understood/lip-read correctly
Column: is 1 if the presented verb was understood correctly
Column: is 1 if the presented amount was understood correctly
Column: is 1 if the presented adjective was understood correctly
Column: is 1 if the presented subject was understood correctly

For visual-only stimuli, columns 6–9 are always 0, as participants only repeated the name.

The “participants/px/meg_data” directory contains an “i-raw.fif” file for each of the 420 presented sentences. The files can be loaded with the MNE-library as follows:

meg = mne.read_raw_fif(“…/1-raw.fif“)

#The data can be accessed:

meg_data = meg.get_data()

#The info file can be accessed:

meg_info = meg.info

The “participants/" directory additionally contains:

“participants/participants_overview.csv": overview of the age and sex of the participants.
"read_me.txt": containing information on individual missing sentences of individual participants

Stimuli Data Structure

The “stimuli” directory contains a folder for each audiovisual stimuli condition (avatar, degraded, natural, still_image). Additionally, the mp3 file of each sentence is in the folder “stimuli/mp3_files”. Each of the five folders contains a version of each sentence. Each stimulus/sentence file is identifiable by its sentence ID, which ranges from 1 to 571.

Technical Details

The meg data were recorded at the University Hospital in Erlangen, Germany. The system is a 248 magnetometer system (4D Neuroimaging, San Diego, CA, USA).

The video signal was presented via a beamer outside of the shielded chamber. The video was displayed on a screen above the participant via mirrors.

The Audio signal was transmitted via 2 m-long, 2 cm-diameter tubes, resulting in a 6 ms delay. The stimuli used in the experiment were corrected for this delay. The stimuli provided here are with original alignment (not corrected for the setup-specific 6ms delay).

The attended sentence and the babbling noise were presented on both ears diotically with a sound pressure level of 68 dB(A).

Processing of meg data

Three meg channels were removed from all measurement data as they were broken and show no signal. The data were analog-filtered from 1.0 to 200 Hz. It was offline-filtered using a notch filter (Firwin, 0.5 Hz bandwidth) at power-line frequencies (50, 100, 150, 200 Hz). The data were then resampled from 1017.25 Hz to 1000 Hz.

Alignment of audio and meg data

The meg data are cut into sentence-long snippets aligned with the mp3 files. Load a mp3 file and resample it to 1000 Hz. Then load a meg file corresponding to the same sentence id. The two loaded instances should now have the same shape.

You can use librosa to load and resample the mp3 file:

audio_data, sr = librosa.load(audio_path, sr=None)

audio_data = librosa.resample(audio_data, orig_sr=sr, target_sr=1000)

Paper to cite when using this data

Riegel et al., “Talking avatars can differentially modulate cortical speech tracking in the high and in the low delta band” (https://doi.org/10.64898/2026.01.07.695461)

Example Code

Example code on how to compute Temporal Response Functions and predictor variables is provided in a repository by Alina Schüller: https://github.com/Al2606/MEG-Analysis-Pipeline

Files

meg_data_and_stimuli.zip

Files (49.7 GB)

Name	Size	Download all
meg_data_and_stimuli.zip md5:22856bf0156e9562077131c668d66828	49.7 GB	Preview Download

Additional details

Is part of: Preprint: 10.64898/2026.01.07.695461 (DOI)

Repository URL: https://github.com/Al2606/MEG-Analysis-Pipeline
Programming language: Python

	All versions	This version
Views	41	41
Downloads	2	2
Data volume	99.5 GB	99.5 GB

meg_data_and_stimuli.zip

Files (49.7 GB)

Related works

Software

MEG Audiovisual Matrix-Sentence Dataset (using Natural, Avatar, Degraded and Still-Image Version of the Speaker)

Authors/Creators

Description

Files

meg_data_and_stimuli.zip

Files (49.7 GB)

Additional details

Related works

Software