EPFL-Smart-Kitchen-30 Annotations and Poses

Bonnetto, Andy; Qi, Haozhe; Leong, Franklin; Tashkovska, Matea; Rad, Mahdi; Shokur, Solaiman; Hummel, Friedhelm; Micera, Silvestro; Pollefeys, Marc; Mathis, Alexander

doi:10.5281/zenodo.15551913

Published June 3, 2025 | Version V1

Dataset Open

EPFL-Smart-Kitchen-30 Annotations and Poses

1. École Polytechnique Fédérale de Lausanne
2. Microsoft (Switzerland)
3. EPFL Institute of Bioengineering
4. Scuola Superiore Sant'Anna
5. Microsoft
6. ETH Zurich

# The EPFL-Smart-Kitchen-30

> ⚠️ videos and other collected data can be found at https://zenodo.org/records/15535461

Understanding behavior requires datasets that capture humans while carrying out complex tasks. The kitchen is an excellent environment for assessing human motor and cognitive function, as many complex actions are naturally exhibited in kitchens from chopping to cleaning. Here, we introduce the EPFL-Smart-Kitchen-30 dataset, collected in a noninvasive motion capture platform inside a kitchen environment. Nine static RGB-D cameras, inertial measurement units (IMUs) and one head-mounted HoloLens~2 headset were used to capture 3D hand, body, and eye movements. The EPFL-Smart-Kitchen-30 dataset is a multi-view action dataset with synchronized exocentric, egocentric, depth, IMUs, eye gaze, body and hand kinematics spanning 29.7 hours of 16 subjects cooking four different recipes. Action sequences were densely annotated with 33.78 action segments per minute. Leveraging this multi-modal dataset, we propose four benchmarks to advance behavior understanding and modeling through

1) a vision-language benchmark,

2) a semantic text-to-motion generation benchmark,

3) a multi-modal action recognition benchmark,

4) a pose-based action segmentation benchmark.

## General informations

* **Authors**: Andy Bonnetto 1, Haozhe Qi 1, Franklin Leong 1, Matea Tashkovska 1, Mahdi Rad 3, Solaiman Shokur 1,3, Friedhelm Hummel 1,4,5, Silvestro Micera 1,3, Marc Pollefeys 2,6, Alexander Mathis 1

* **Affiliation**: 1 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, 2 Microsoft, 3 Scuola Superiore Sant’Anna, Pisa, 4 Swiss Federal Institute of Technology Valais (EPFL Valais), Clinique Romande de Réadaptation, Sion, 5 University of Geneva Medical School, Geneva, 6 Eidgenössische Technische Hochschule (ETH), Zürich

* **Date of collection**: 05.2023 - 01.2024 (MM.YYYY - MM.YYYY)

* **Geolocation data**: Campus Biotech, Genève, Switzerland

* **Associated publication URL**: https://arxiv.org/abs/2506.01608

* **Funding**: Our work was funded by EPFL and Microsoft Swiss Joint Research Center and a Boehringer Ingelheim Fonds PhD stipend (H.Q.). We are grateful to the Brain Mind Institute for providing funds for the cameras and to the Neuro-X Institute for providing funds to annotate data.

## Dataset availability

* **License**: This dataset is released under the non-commercial [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/legalcode) license.

* **Citation**: Please cite the associated publication when using our data.

* **Repository URL**: https://github.com/amathislab/EPFL-Smart-Kitchen

* **Repository DOI**: 10.5281/zenodo.15551913

* **Dataset version**: v1

## Data and files overview

* **Data preparation**: unzip `Public_release_pose.zip`

* **Repository structure**:

```

Public_release_pose

├── README.md

├── train

| ├── YH2002 (participant)

| | ├── 2023_12_04_10_15_23 (session)

| | | ├── annotations

| | | | ├── action_annotations.xlsx

| | | | └── activity_annotations.json

| | | ├── pose_3d

| | | | ├── pose3d_mano.csv

| | | | └── pose3d_smpl.csv

| | └── ...

| └── ...

└── test

└── ...

```

* `train` and `test`: Contains the train and test data for the action recognition task, the actions segmentation task and the full-body motion generation task. These folders are structured in participants and sessions. Each session contains 2 modalities:

* **annotations**: contains the action and activity annotation data.

* **pose_3d**: 3D pose estimation for the hand (MANO) and for the body (SMPL).

> We refer the reader to the associated publication for details about data processing and tasks description.

### Naming conventions

* Exocentric camera names are the following : output0 , Aoutput0, Aoutput1, Aoutput2, Aoutput3, Boutput0, Boutput1, Boutput2, Boutput3.

* Participant are identified with YH and a random identifier, sessions are given by the date and time of recording.

### File characteristics

* `action_annotations.xlsx`: Table with the following fields:

* Start : start time in second of an action

* End : end time in second of an action

* Verbs : annotated verb for the segment

* Nouns : annotated noun for the segment

* Confusion: confusion of annotator for this segment (0-1)

* `activity_annotations.json` : Json file with the following fields:

* datetime : time of the annotation

* video_file : annotated session (corresponds to all cameras)

* annotations:

* start: start time in second of an action

* end : end time in second of an action

* Activities : annotated activity

* `pose3d_mano` and `pose3d_smpl`: 3D pose estimation for the hand and body, contain the following fields:

* kp3ds : 3D pose estimation (42 keypoints for the hands (left/rights) and 17 keypoints for the body)

* left_poses/right_poses : pose parameters of the fitted mesh model

* left_RH/right_RH : rotation matrices for the fitted mesh model

* left_TH/right_TH : translation matrices for the fitted mesh model

* left_shapes/right_shapes: shape parameters for the fitted mesh model

> We refer the reader to the associated publication for details about data processing and tasks description.

## Methodological informations

**Benchmark evaluation code**: Will be available soon

> We refer the reader to the associated publication for details about data processing and tasks description.

## Acknowledgements

Our work was funded by EPFL and Microsoft Swiss Joint Research Center and a Boehringer Ingelheim Fonds PhD stipend (H.Q.). We are grateful to the Brain Mind Institute for providing funds for the cameras and to the Neuro-X Institute for providing funds to annotate data

## Change log (DD.MM.YYYY)

[03.06.2025]: First data release !

Files

README.md

Files (12.7 GB)

Name	Size	Download all
Public_release_pose.zip md5:efe89cbd7d6b6d091d8cf9bf60c9bd69	12.7 GB	Preview Download
README.md md5:e258f489d63e662a2874a63d8d586001	5.9 kB	Preview Download

Additional details

Continues: Dataset: 10.5281/zenodo.15535461 (DOI)
Is published in: Preprint: arXiv:2506.01608 (arXiv)

Swiss National Science Foundation
Joint behavior and neural data modeling for naturalistic behavior 10000950

Development Status: Active

	All versions	This version
Views	380	380
Downloads	110	110
Data volume	885.5 GB	885.5 GB

EPFL-Smart-Kitchen-30 Annotations and Poses

Files

README.md

Files (12.7 GB)

Additional details

Related works

Funding

Software

EPFL-Smart-Kitchen-30 Annotations and Poses

Creators

Description

Files

README.md

Files (12.7 GB)

Additional details

Related works

Funding

Software