Published November 29, 2021 | Version v1
Dataset Open

gaze and visual focus of attention in conversation and manipulation settings

  • 1. Idiap Research Institute



This database contains automatically extracted features (head pose, gaze, speaking status, people/object positions) as well as manual annotations (visual focus of attention, grasp/release actions) from three different datasets:

  • UBImpressed: 5-10 minutes dyadic interactions in a formal setting (8 sessions, i.e. 16 videos used, partially annotated)

    • Website :

    • Reference : Muralidhar et al. 2016. Training on the Job : Behavioral Analysis of Job Interviews in Hospitality. In ACM International Conference on Multimodal Interaction (ICMI’16).

  • KTH-Idiap Group-Interviewing Corpus : 1 hour long four-party meetings in a more relaxed setting (5 sessions, i.e. 20 videos used, partially annotated)

    • Reference: Oertel et al. 2014. Who will get the grant? In International Conference on Multimodal Interaction Workshop (UMMI).

  • ManiGaze: a set of short gazing and manipulation tasks perform in front of a robot (16 sessions used, partially annotated)

    • Website:

    • Reference: R. Siegfried, B. Aminian and J.-M. Odobez, « ManiGaze: a Dataset for Evaluating Remote Gaze Estimator in Object Manipulation Situations », ACM Symposium on Eye Tracking Research and Applications (ETRA) 2020


The conversation datasets were manually annotated with the visual focus of attention (around 3000 annotated frames per video in the UBImpressed dataset and 9000 annotated frames per video in KTH-Idiap Group-Interviewing dataset. The ManiGaze dataset provides VFOA ground truth for its gazing sessions (51 target per subject) and actions annotations for its object manipulation session (22 actions per subject). See the reference paper for details on feature extraction.

This database was mainly collected to make experiments on visual focus of attention estimation and gaze estimation calibration.



Rémy Siegfried and Jean-Marc odobez, « Robust Unsupervised Gaze Calibration using Conversation and Manipulation Attention Priors », ACM Transactions on Multimedia Computing, Communications, and Applications, 2021




Files (213.5 MB)

Name Size Download all
213.5 MB Download

Additional details

Related works


MuMMER – MultiModal Mall Entertainment Robot 688147
European Commission
Robot skills acquisition through active learning and social interaction strategies (ROSALIS) 200021_172627
Swiss National Science Foundation