Multi-TPC: A Multimodal Dataset for Three-Party Conversations with Speech, Motion, and Gaze

Lee, Meng-Chen

doi:10.5281/zenodo.17935560

Published December 24, 2025 | Version v1

Dataset Open

Multi-TPC: A Multimodal Dataset for Three-Party Conversations with Speech, Motion, and Gaze

Lee, Meng-Chen

Contributors

Data collector:

Lee, Meng-Chen

Supervisor:

Deng, Zhigang¹

1. University of Houston

The dataset comprises multiple synchronized modalities. Motion capture and gaze data are provided as plain-text (TXT) files indexed at the frame level, with joint rotations represented as Euler angles (in degrees) and gaze pitch and yaw angles (in degrees) derived from eye-tracker measurements. Audio recordings are distributed in uncompressed WAV format with a sampling rate of 44.1 kHz, and word-level transcripts are provided as plain-text files containing tokenized words with aligned onset and offset times (in seconds). Prosodic features extracted from the audio signal are stored in a dedicated directory. In addition, an integrated annotation resource combines conversational state labels, gaze information, and head gesture annotations into a unified representation.

Data are organized by modality at the top level, with separate directories for Mocap, Gaze, Audio, Word, Prosody, and AudioGazeGBack. Within each modality directory, files are grouped by recording date (e.g., 01-28-2022, 03-04-2022, 03-11-2022), corresponding to individual recording sessions. This date-based organization enables consistent alignment of all modalities collected during the same session. All frame-based modalities share a common temporal resolution of 60 Hz to support synchronized multimodal analysis.

For most modalities, data are stored separately for each participant. The Prosody and AudioGazeGBack directories are exceptions, as they contain session-level files that integrate information across all participants. In the file naming conventions below, D denotes the recording date, s the session index, and n the participant index.

Modality	File name pattern
Motion	`Mocap/D/Session_s_PC_n_mocap_data.txt`
Gaze	`Gaze/D/Session_s_PC_n_EyeTracker_data_gapfilled.txt`
Audio	`Audio/D/Session_s_PC_n_audio.wav`
Text	`Word/D/Session_s_PC_n_words.csv`
Prosody	`Prosody/D_Session_s_prosody.csv`
Annotated	`AudioGazeGBack/D_Session_s_audio_gaze_gback.csv`

The AudioGazeGBack directory contains processed, frame-level features that represent each moment of interaction across all participants. These features include speaking activity for each participant, gaze direction labels indicating whether a participant is looking toward the left or right listener relative to the current speaker, and gestural backchannel annotations capturing head nodding and shaking.

Detailed descriptions of column headings, abbreviations, units, and file formats are provided in accompanying README files. The full data processing and preprocessing pipeline used to generate these records is documented and made available through the project GitHub repository (https://github.com/MCMartinLee/Multi-TPC) to support transparency and reproducibility.

Files

Dataset.zip

Files (5.5 GB)

Name	Size
Dataset.zip md5:e4493b9e97f2046550220672cef2e4ad	5.5 GB	Preview Download

Additional details

U.S. National Science Foundation
CHS: Small: An Analysis-and-Synthesis Framework for Small Group Conversations 2005430

Repository URL: https://github.com/MCMartinLee/Multi-TPC
Programming language: Python , MATLAB , C++

	All versions	This version
Views	333	333
Downloads	63	63
Data volume	375.9 GB	375.9 GB

Contributors

Data collector:

Supervisor:

Dataset.zip

Files (5.5 GB)

Funding

Software

Multi-TPC: A Multimodal Dataset for Three-Party Conversations with Speech, Motion, and Gaze

Authors/Creators

Contributors

Data collector:

Supervisor:

Description

Files

Dataset.zip

Files (5.5 GB)

Additional details

Funding

Software