Published December 24, 2025 | Version v1
Dataset Open

Multi-TPC: A Multimodal Dataset for Three-Party Conversations with Speech, Motion, and Gaze

Authors/Creators

Contributors

Data collector:

Supervisor:

  • 1. EDMO icon University of Houston

Description

The dataset comprises multiple synchronized modalities. Motion capture and gaze data are provided as plain-text (TXT) files indexed at the frame level, with joint rotations represented as Euler angles (in degrees) and gaze pitch and yaw angles (in degrees) derived from eye-tracker measurements. Audio recordings are distributed in uncompressed WAV format with a sampling rate of 44.1 kHz, and word-level transcripts are provided as plain-text files containing tokenized words with aligned onset and offset times (in seconds). Prosodic features extracted from the audio signal are stored in a dedicated directory. In addition, an integrated annotation resource combines conversational state labels, gaze information, and head gesture annotations into a unified representation.

Data are organized by modality at the top level, with separate directories for Mocap, Gaze, Audio, Word, Prosody, and AudioGazeGBack. Within each modality directory, files are grouped by recording date (e.g., 01-28-2022, 03-04-2022, 03-11-2022), corresponding to individual recording sessions. This date-based organization enables consistent alignment of all modalities collected during the same session. All frame-based modalities share a common temporal resolution of 60 Hz to support synchronized multimodal analysis.

For most modalities, data are stored separately for each participant. The Prosody and AudioGazeGBack directories are exceptions, as they contain session-level files that integrate information across all participants. In the file naming conventions below, D denotes the recording date, s the session index, and n the participant index.

Modality File name pattern
Motion Mocap/D/Session_s_PC_n_mocap_data.txt
Gaze Gaze/D/Session_s_PC_n_EyeTracker_data_gapfilled.txt
Audio Audio/D/Session_s_PC_n_audio.wav
Text Word/D/Session_s_PC_n_words.csv
Prosody Prosody/D_Session_s_prosody.csv
Annotated AudioGazeGBack/D_Session_s_audio_gaze_gback.csv

The AudioGazeGBack directory contains processed, frame-level features that represent each moment of interaction across all participants. These features include speaking activity for each participant, gaze direction labels indicating whether a participant is looking toward the left or right listener relative to the current speaker, and gestural backchannel annotations capturing head nodding and shaking.

Detailed descriptions of column headings, abbreviations, units, and file formats are provided in accompanying README files. The full data processing and preprocessing pipeline used to generate these records is documented and made available through the project GitHub repository (https://github.com/MCMartinLee/Multi-TPC) to support transparency and reproducibility.

Files

Dataset.zip

Files (5.5 GB)

Name Size Download all
md5:e4493b9e97f2046550220672cef2e4ad
5.5 GB Preview Download

Additional details

Funding

U.S. National Science Foundation
CHS: Small: An Analysis-and-Synthesis Framework for Small Group Conversations 2005430

Software

Repository URL
https://github.com/MCMartinLee/Multi-TPC
Programming language
Python , MATLAB , C++