Published January 23, 2024 | Version 1.1
Dataset Open

THÖR-MAGNI: A Large-scale Indoor Motion Capture Recording of Human Movement and Interaction

  • 1. ROR icon Aalto University
  • 2. ROR icon Örebro University
  • 3. ROR icon Robert Bosch (Germany)
  • 4. ROR icon University of Stuttgart
  • 5. ROR icon Technical University of Munich


The THÖR-MAGNI Dataset Tutorials

THÖR-MAGNI datasets is a novel dataset of accurate human and robot navigation and interaction in diverse indoor contexts, building on the previous THÖR dataset protocol. We provide position and head orientation motion capture data, 3D LiDAR scans and gaze tracking. In total, THÖR-MAGNI captures 3.5 hours of motion of 40 participants on 5 recording days.

This data collection is designed around systematic variation of factors in the environment to allow building cue-conditioned models of human motion and verifying hypotheses on factor impact. To that end, THÖR-MAGNI encompasses 5 scenarios, in which some of them have different conditions (i.e., we vary some factor):

  • Scenario 1 (plus conditions A and B):
    •  Participants move in groups and individually;
    •  Robot as static obstacle;
    •  Environment with 3 obstacles and lane marking on the floor for condition B;
  •  Scenario 2:
    •  Participants move in groups, individually and transport objects with variable difficulty (i.e. bucket, boxes and a poster stand);
    •  Robot as static obstacle;
    •  Environment with 3 obstacles;
  • Scenario 3 (plus conditions A and B):
    •  Participants move in groups, individually and transporting objects with variable difficulty (i.e. bucket, boxes and a poster stand). We denote each role as: Visitors-Alone, Visitors-Group 2, Visitors-Group 3, Carrier-Bucket, Carrier-Box, Carrier-Large Object;
    •  Teleoperated robot as moving agent: in condition A, the robot moves with differential drive; in condition B, the robot moves with omni-directional drive;
    •  Environment with 2 obstacles;
  • Scenario 4 (plus conditions A and B):
    •  All participants, denoted as Visitors-Alone HRI interacted with the teleoperated mobile robot;
    •  Robot interacted in two ways: in condition A (Verbal-Only), the Anthropomorphic Robot Mock Driver (ARMoD), a small humanoid NAO robot on top of the mobile platform, only used speech to communicate the next goal point to the participant; in condition B the ARMoD used speech, gestures and robotic gaze to convey the same message;
    •  Free space environment
  • Scenario 5:
    •  Participants move alone (Visitors-Alone) and one of the participants, denoted as Visitors-Alone HRI, transport objects and interact with the robot;
    •  The ARMoD is remotely controlled by an experimenter and proactively offers help;
    •  Free space environment;

Preliminary steps

Before proceeding, make sure to download the data from ZENODO

1. Directory Structure

├── docs

│                ├── <- Tutorials document on how to use the data

├── goals_positions.csv <- File with the goals locations

├── maps <- Directory for maps of the environment (PNG files) and offsets (json file)

│               ├── offsets.json <- Offsets of the map with respect to the global coordinate frame origin

│               ├── {date}_SC{sc_id}_map.png <- Maps for `date` in {1205, 1305, 1705, 1805} and `sc_id` in {1A, 1B, 2, 3}

│               ├── 3009_map.png <- Map for the Scenarios 4A, 4B and 5

├── CSVs_Scenarios <- Directory for aligned data for all scenarios

│               ├── Scenario_1 <- Directory for the CSV files for Scenario 1

│               ├── Scenario_2 <- Directory for the CSV files for Scenario 2

│               ├── Scenario_3 <- Directory for the CSV files for Scenario 3

│               ├── Scenario_4 <- Directory for the CSV files for Scenario 4

│               ├── Scenario_5 <- Directory for the CSV files for Scenario 5

├── TSVs_RAWET <- Directory for the TSV files for the Raw Eyetracking data for all Scenarios

│               ├── synch_info.csv <- Event markers necessary to align motion capture with eyetracking data

│               ├── Files <- Directory with all the raw eyetracking TSV files


2. Data Structure and Dataset Files

Withing each Scenario directory, each csv file contains:

2.1. Headers

The dataset metadata overview contains important information found in the CSV file headers. This reference is designed to help users understand and use the dataset effectively. The headers include details such as FILE_ID, which provides information on the date, scenario, condition, and run associated with each recording. The header of the document includes important quantities such as the number of frames recorded (N_FRAMES_QTM), the count of rigid bodies (N_BODIES), and the total number of markers (N_MARKERS).

It also provides information about the order of the contiguous rotation matrix (CONTIGUOUS_ROTATION_MATRIX), modalities measured with units, and specified measurement units. The text presents details on the eyetracking devices used in each recording, including their infrared sensor and scene camera frequencies, as well as an indication of the presence of eyetracking data.

The header provides specific information about rigid bodies, including their names (BODY_NAMES), role labels (BODY_ROLES), and the number of markers associated with each rigid body (BODY_NR_MARKERS). Finally, the table lists all marker names used in the file.

This metadata provides researchers and practitioners with essential guidance on recording information, data quantities, and specifics about rigid bodies and markers. It is a valuable resource for understanding and effectively using the dataset in the CSV files.

2.2. Trajectory Data

The remaining portion of the CSV file integrates merged data from the motion capture system and eye tracking devices, organized based on participants' helmet rigid bodies. Columns within the dataset include XYZ coordinates of all markers, spatial centroid coordinates, 6DOF orientation of the object's local coordinate frame, and if available eye tracking data, encompassing 2D/3D gaze coordinates, scene recording frame numbers, eye movement types, and IMU data.

Missing data is denoted by "N/A" or an empty cell. Temporal indexing is facilitated by the "Time" or "Frame" column, indicating timestamps or frame numbers. The motion capture system records at 100Hz, Tobii Glasses at 50Hz (Raw); 25 Hz (Camera), and Pupil Glasses at 100Hz (Raw); 30 Hz (Camera). The dataset is structured around motion capture recordings, and for each rigid body, such as "Helmet_1," details per frame include XYZ coordinates of markers, centroid coordinates, and a 9-element rotational matrix describing helmet orientation.

Header Explanation
Helmet_1 - 1 X X-Coordinate of Marker Number 1
Helmet_1 - 1 Y Y-Coordinate of Marker Number 1
Helmet_1 - 1 Z Z-Coordinate of Marker Number 1
Helmet_1 - [...] Same for Marker 2 and 3 of Helmet_1
Helmet_1 Centroid_X X-Coordinate of the Centroid
Helmet_1 Centroid_Y Y-Coordinate of the Centroid
Helmet_1 Centroid_Z Z-Coordinate of the Centroid
Helmet_1 R0 1st Element of the CONTIGUOUS_ROTATION_MATRIX
Helmet_1 R[..] Same for R1- R7
Helmet_1 R8 9th Element of the CONTIGUOUS_ROTATION_MATRIX


2.3. Eyetracking Data

The eye tracking data in the dataset includes 16 participants, providing a comprehensive dataset of over 500 minutes of recorded data across the different activities and scenarios with three different eyetracking devices. Devices are denoted with a special "Tracker_ID" in the dataset, i.e.:

Tracker ID Eyetracking Device
TB2 Tobii 2 Glasses
TB3 Tobii 3 Glasses
PPL Pupil Insivisible Glasses

Gaze points are classified into fixations and saccades using the Tobii I-VT Attention filter, which is specifically optimized for dynamic scenarios with a velocity threshold of 100°. Eyetracking devices were systematically repeated after each 4-minute recording to account for natural variations in participants' eye shapes and to improve the gaze estimation algorithms. In addition, gaze estimation adjustments for the pupil invisible glasses were made after each 4-minute recording to mitigate potential drifts. It's worth noting that the scene cameras of the eye tracking glasses had different fields of view. The scene camera of the Pupil Invisible Glasses had a 1088x1080 image with both horizontal (HFOV) and vertical (VFOV) opening angles of 80°, while the Tobii Glasses provided a 1920x1080 image with different opening angles for Tobii Glasses 3 (HFOV: 95°, VFOV: 63°) and Tobii Glasses 2 (HFOV: 82°, VFOV: 52°).

NOTE: Videos are not part of the dataset, they will be made available in 2024

For one participant, wearing the Tobii Glasses 3 and Helmet_6, the data would be denoted as:

Header Explanation
Helmet_6 - [...] *X,Y,Z Coordinates for 5 markers*
Helmet_6 [...] X,Y,Z Coordinates for 1 Centroid* 
Helmet_6 R[...] 9 Elements of the CONTIGUOUS_ROTATION_MATRIX

Helmet_6 TB3_Accelerometer_[...]

Accelerometer data along the X,Y,Z Axis
Helmet_6 TB3_Gyroscope_[...] Gyroscope data along the X,Y,Z Axis
Helmet_6 TB3_Magnetometer_[...] Magnetometer data along the X,Y,Z Axis
Helmet_6 TB3_G2D_[...] 2D Eye tracking data (X,Y)
Helmet_6 TB3_G3D_[...] 3D Cyclopic Eye gaze Vector (X,Y,Z)
Helmet_6 TB3_Movement Eye movement type (N/A, Fixation or Saccade)
Helmet_6 TB3_SceneFNr Frame number of the scene camera recording 

How to use and tools


This is a dashboard to quickly visualize our data: trajectories, speeds, eye-tracking data and LiDAR visualization (for Scenario 3). If you cannot use the dashboard from the streamlit cloud service, just run it locally by following the README File.


To install and use the package, follow the instructions on the README file . This package comprises:

  • 3D trajectory restoration: agents in the scene wore an helmet. The helmet is equipped with markers, which are tracked by the Mocap system. 3D trajectory restoration stands for two different ways of aggregating the trackings of the various markers in each helmet: (1) 3D-restoration and (2) 3D-best marker. The former applies an average over the locations of all visible markers while the latter uses the marker with highest tracking duration.
  • 3D pre-processing of restored trajectories: interpolation, downsampling and smoothing. To run the 3D pre-processing, check this.
  • trajectory analysis: trajectory-related metrics like tracking duration (in seconds), number of 8s tracklets, motion speed, path efficiency score, and minimal distance between people. To run the trajectory analysis, check this.


Files (1.1 GB)

Name Size Download all
1.1 GB Preview Download

Additional details

Related works

Is described by
Publication: 10.48550/arXiv.2208.14925 (DOI)
Is source of
Conference paper: 10.1109/RO-MAN57019.2023.10309629 (DOI)
Conference proceeding: urn:nbn:se:oru:diva-109508 (URN)


DARKO – Dynamic Agile Production Robots That Learn and Optimise Knowledge and Operations 101017274
European Commission
Knut and Alice Wallenberg Foundation