Sixth Sense: Indoor Human Spatial Awareness Dataset

Arreghini, Simone; Carlotti, Nicholas; Nava, Mirko; Paolillo, Antonio; Giusti, Alessandro

doi:10.5281/zenodo.14936069

Published February 27, 2025 | Version v1

Dataset Open

Sixth Sense: Indoor Human Spatial Awareness Dataset

1. Dalle Molle Institute for Artificial Intelligence Research

Related Paper

Sixth-Sense: Self-Supervised Learning of Spatial Awareness of Humans from a Planar Lidar accepted at ARSO 2026.

Work Abstract

Reliable localization of people is fundamental for service and social robots that must operate in close interaction with humans. State-of-the-art human detectors often rely on RGB-D cameras or costly 3D LiDARs. However, most commercial robots are equipped with cameras with a narrow field of view, leaving them unaware of users approaching from other directions, or inexpensive 1D LiDARs whose readings are hard to interpret. To address these limitations, we propose a self-supervised approach to detect humans and estimate their 2D pose from 1D LiDAR data, using detections from an RGB-D camera as supervision. Trained on 70 minutes
of autonomously collected data, our model detects humans omnidirectionally in unseen environments with 71% precision, 80% recall, and mean absolute errors of 13 cm in distance and 44° in orientation, measured against ground truth data. Beyond raw detection accuracy, this capability is relevant for robots operating in shared public spaces, where omnidirectional awareness of nearby people is crucial for safe navigation, appropriate approach behavior, and timely human-robot in-
teraction initiation using low-cost, privacy-preserving sensing. Deployment in two additional public environments further suggests that the approach can serve as a practical wide-FOV awareness layer for socially aware service robotics.

Related material

Dataset Overview

The Sixth Sense: Indoor Human Spatial Awareness Dataset comprises sensor recordings from a custom version of PAL Robotics TIAGo equipped with two 1D planar LiDARs, an Azure Kinect camera for human detections. The dataset was collected over nine days in three distinct indoor environments, capturing human activity and spatial interactions.

Data Collection Environments

University Corridor: A public transit area between classrooms with study desks and passersby (36k samples). The robot's motion was manually controlled for safety reasons.
Break Area: A large indoor space with tables and chairs where expert individuals interact with the robot (12k samples). The robot followed autonomous, randomized trajectories while avoiding obstacles.
Lab: A controlled laboratory environment where expert individuals interact with the robot (7k samples). This setup includes high-precision ground truth tracking from an OptiTrack motion capture system.

Data Splits

In the original paper the dataset was diveded as follows:

Training Set: All University Corridor samples and half of Break Area samples (42k samples).
Validation Set: The remaining half of Break Area samples (6k samples).
Test Set: All Lab samples (7k samples).

Dataset Structure

The dataset is provided as multiple .h5 files, each containing different sensor data recordings. The structure is as follows:

Sensor Data

LiDAR Data:
- scan_raw: Raw LiDAR scan data from the front sensor.
- scan_raw_back: Raw LiDAR scan data from the rear sensor.
- scan_virtual_history: History of scans from a virtual LiDAR positioned in the robot base center, this lidar has been defined to have 360 laser readings equally spaced around the robot [n_timestamps x 360].
Human Detection Data:
- humans_distance_sensor: Distance measurements to detected humans [n_timestamps x 360].
- humans_presence_sensor: Presence indicators for detected humans [n_timestamps x 360].
- humans_relative_bearing_sensor: Bearing angles of detected humans relative to the robot [n_timestamps x 360].
Azure Kinect Data:
- body_tracking_data: Body joint tracking from the Kinect camera [n_timestamps x (32xn_people) x 7].
- camera_fov_mask: Field of view mask of the Kinect [n_timestamps x 360].
OptiTrack Motion Capture Data (available only in the Lab scenario):
- optitrack__base_footprint_optitrack: Robot’s pose from the motion capture system [n_timestamps x 7].
- optitrack__person_marker_1: Ground truth marker for person 1 [n_timestamps x 7].
- optitrack__person_marker_2: Ground truth marker for person 2 [n_timestamps x 7].
- optitrack__person_marker_3: Ground truth marker for person 3 [n_timestamps x 7].
- humans_distance_optitrack: Distance measurements to detected humans from OptiTrack [n_timestamps x 360].
- humans_presence_optitrack: Presence indicators based on OptiTrack data [n_timestamps x 360].
- humans_relative_bearing_optitrack: Bearing angles of detected humans from OptiTrack [n_timestamps x 360].
Odometry Data [3 velocity, 4 relative quaternion, 6 twist]:
- odom: Raw odometry data [n_timestamps x 13].
- odom_corrected: Odometry data corrected using front LiDAR data directlyu from the robot software stack [n_timestamps x 13].
Transformation Data (TF Frames):
- tf_base_link_wrt_map: Robot’s base link relative to the map [n_timestamps x 7].
- tf_base_link_wrt_odom: Robot’s base link relative to odometry [n_timestamps x 7].
- tf_azure_kinect_depth_camera_link_wrt_base_link: Kinect camera link relative to the robot’s base [n_timestamps x 7].
- tf_base_laser_link_wrt_base_link: LiDAR sensor frame relative to the base link [n_timestamps x 7].
- tf_base_laser_back_link_wrt_base_link: Rear LiDAR sensor frame relative to the base link [n_timestamps x 7].

In all cases the dimension 7 is comprised as follow: [3 position, 4 relative quaternion]

Usage and Applications

This dataset is designed for research in:

Human detection and tracking using LiDAR and depth sensors.
Robot spatial awareness and navigation in human environments.
Self-supervised learning for human motion prediction.

Citation

If you use this dataset in your research, please cite the corresponding paper:

Acknowledgments

All authors are with the Dalle Molle Institute for Artificial Intelligence (IDSIA), USI-SUPSI, Lugano, 6962, Switzerland name.surname@supsi.ch
This work was supported by the European Union through the project SERMAS, by the Swiss State Secretariat for Education, Research and Innovation (SERI) under contract number 22.00247, and by the Swiss National Science Foundation, grant number 213074.

Files