Published February 7, 2025 | Version 1.0
Dataset Open

A multi-modal explainability approach for human-aware robots in multi-party conversation - Data

  • 1. ROR icon Italian Institute of Technology
  • 1. ROR icon Comenius University Bratislava
  • 2. ROR icon Italian Institute of Technology

Description

# Info about the project

## Name and publication

The name of this project stands for Explainable Addressee Estimation. The data contained in this repository were used or obtained in the study 
"A Multi-Modal Explainability Approach for Human-Aware Robots in Multi-Party Conversation", published in Computer Vision and Image Understanding for publication https://doi.org/10.48550/arXiv.2407.03340

Reference:
Iveta Bečková, Štefan Pócoš, Giulia Belgiovine, Marco Matarese, Omar Eldardeer, Alessandra Sciutti, and Carlo Mazzola. "A multi-modal explainability approach for human-aware robots in multi-party conversation." Computer Vision and Image Understanding, vol. 253, 2025, pp. 104304. doi: 10.1016/j.cviu.2025.104304.

The project builds upon a previous study, by Mazzola et al. (2023) doi: https://doi.org/10.1109/IJCNN54540.2023.10191452

## Description

### Highlights
- Improvement of the state-of-the-art performance on the Vernissage Addressee Estimation dataset.
- Design of an inherently explainable architecture for Addressee Estimation.
- Integration and validation of the explainable model into a humanoid robot iCub.
- Transparent architecture for human-activity recognition in multi-party conversation. 
- User study to analyze how human participants perceive the quality of explanations.

### Abstract

The addressee estimation (understanding to whom somebody is talking) is a fundamental task for human activity recognition 
in multi-party conversation scenarios. Specifically, in the field of human–robot interaction, it becomes even more crucial 
to enable social robots to participate in such interactive contexts. However, it is usually implemented as a binary classification task, 
restricting the robot’s capability to estimate whether it was addressed or not, which limits its interactive skills. 
For a social robot to gain the trust of humans, it is also important to manifest a certain level of transparency and explainability. 
Explainable artificial intelligence thus plays a significant role in the current machine learning applications and models, 
to provide explanations for their decisions besides excellent performance. 
In our work, we (a) present an addressee estimation model with improved performance in comparison with the previous state-of-the-art; 
(b) further modify this model to include inherently explainable attention-based segments; 
(c) implement the explainable addressee estimation as part of a modular cognitive architecture for multi-party conversation in an iCub robot; 
(d) validate the real-time performance of the explainable model in multi-party human–robot interaction; 
(e) propose several ways to incorporate explainability and transparency in the aforementioned architecture; 
and(f) perform an online user study to analyze the effect of various explanations on how human participants perceive the robot.

# Content

This repository contains the data used and obtained in the project.
A parallel repository (https://doi.org/10.5281/zenodo.14883488) contains the code used in the project.
The personal data collected with iCub to 
1) retrain the XAE model and improve the real-time addressee estimation accuracy on the robot and
2) test the accuracy of the model in real-time interactions
can be accessed only under Data Access Agreement. For more information please write to alessandra.sciutti@iit.it

The content of the repository follows the structure of the paper.

## A_Design_explainable_AE_model

Files contained in this folder refer to the points a) and b) of the Abstract, that is the design and training of several deep neural networks developed to solve the addressee-direction estimation task.
In this folder, you can find the Checkpoints of the models trained on the Vernissage Dataset (Jayagopi et al., 2012). 

Some of the models were trained using the Vernissage Dataset and another set of data collected through the robot cameras in order to improve the performance of the model deployed in the iCub. 
These data are personal data, hence protected and accessible only thorugh a Data Sharing Agreement (DSA) with the Italian Institute of Technology (please contact alessandra.sciutti@iit.it for this)

## B_Implementation_in_robotic_architecture

Files contained in this folder refer to the points c) and d) and e) of the Abstract. 
The "log_files_interactions" folder contains the log_files obtained from the robotic architecture during the interaction of iCub with human participants in 7 different sessions.
The "analysis" folder contains:
- in "realtime_performance" folder, the .csv files obtain by processing the log info and used to assess the performance of the architecture in addressee estimation using visual info
- in "LLMs" folder, the transcriptions of the conversation and the results of the LLMs performance on the addressee estimation task using textual info 

## C_User_evaluation

Files contained in this folder refer to the point f) of the Abstract. 
Beyond the video of the experiment used for the online survey video_experiment.mp4 in "video_recording", in this directory there are the files used for the statistical analysis of the data collected through the online user study.
In the "online_user_study" folder:
- raw_data_onlinestudy.csv contains the raw data of the survey
- df_mixed_model.csv is a datasheet of the data processed and ordered for the statistical analysis (mixed effect models)
- Analysis_CVIU.omv is a Jamovi file that containes the analysis and their results.

# Contribution

Iveta Bečková and Štefan Pócoš worked on section A_Design_explainable_AE_model
Carlo Mazzola, Giulia Belgiovine and Omar Eldardeer worked on section B_Implementation_in_robotic_architecture
Carlo Mazzola and Marco Matarese worked on section C_User_evaluation

# License
Creative Commons Attribution 4.0 International (CC-BY-4.0)
https://joinup.ec.europa.eu/licence/creative-commons-attribution-40-international-cc-40


# Citation

Iveta Bečková, Štefan Pócoš, Giulia Belgiovine, Marco Matarese, Omar Eldardeer, Alessandra Sciutti, and Carlo Mazzola. "A multi-modal explainability approach for human-aware robots in multi-party conversation." Computer Vision and Image Understanding, vol. 253, 2025, pp. 104304. doi: 10.1016/j.cviu.2025.104304.

Files

A_Design_explainable_AE_model.zip

Files (158.7 MB)

Name Size Download all
md5:c8bd57322d8eb7342c4dd4365f465093
26.5 MB Preview Download
md5:4ba2b6aef9b0f3b88bb9418c7596b2d0
691.9 kB Preview Download
md5:8706d723019b7a6edc441fd7ae3a05bd
131.5 MB Preview Download
md5:5117d1c90ac10e0d4ae4a96476e006dc
6.4 kB Preview Download

Additional details

Related works

Is compiled by
Software: 10.5281/zenodo.14883488 (DOI)
Is published in
Journal article: 10.1016/j.cviu.2025.104304 (DOI)
Journal article: 10.48550/arXiv.2407.03340 (DOI)

Funding

European Commission
TERAIS - Towards Excellent Robotics and Artificial Intelligence at a Slovak university 101079338
Slovak Research and Development Agency
APVV-21-0105