A multi-modal explainability approach for human-aware robots in multi-party conversation - Data
Contributors
Supervisor:
Description
# Info about the project
## Name and publication
The name of this project stands for Explainable Addressee Estimation. The data contained in this repository were used or obtained in the study
"A Multi-Modal Explainability Approach for Human-Aware Robots in Multi-Party Conversation", published in Computer Vision and Image Understanding for publication https://doi.org/10.48550/arXiv.2407.03340
Reference:
Iveta Bečková, Štefan Pócoš, Giulia Belgiovine, Marco Matarese, Omar Eldardeer, Alessandra Sciutti, and Carlo Mazzola. "A multi-modal explainability approach for human-aware robots in multi-party conversation." Computer Vision and Image Understanding, vol. 253, 2025, pp. 104304. doi: 10.1016/j.cviu.2025.104304.
The project builds upon a previous study, by Mazzola et al. (2023) doi: https://doi.org/10.1109/IJCNN54540.2023.10191452
## Description
### Highlights
- Improvement of the state-of-the-art performance on the Vernissage Addressee Estimation dataset.
- Design of an inherently explainable architecture for Addressee Estimation.
- Integration and validation of the explainable model into a humanoid robot iCub.
- Transparent architecture for human-activity recognition in multi-party conversation.
- User study to analyze how human participants perceive the quality of explanations.
### Abstract
The addressee estimation (understanding to whom somebody is talking) is a fundamental task for human activity recognition
in multi-party conversation scenarios. Specifically, in the field of human–robot interaction, it becomes even more crucial
to enable social robots to participate in such interactive contexts. However, it is usually implemented as a binary classification task,
restricting the robot’s capability to estimate whether it was addressed or not, which limits its interactive skills.
For a social robot to gain the trust of humans, it is also important to manifest a certain level of transparency and explainability.
Explainable artificial intelligence thus plays a significant role in the current machine learning applications and models,
to provide explanations for their decisions besides excellent performance.
In our work, we (a) present an addressee estimation model with improved performance in comparison with the previous state-of-the-art;
(b) further modify this model to include inherently explainable attention-based segments;
(c) implement the explainable addressee estimation as part of a modular cognitive architecture for multi-party conversation in an iCub robot;
(d) validate the real-time performance of the explainable model in multi-party human–robot interaction;
(e) propose several ways to incorporate explainability and transparency in the aforementioned architecture;
and(f) perform an online user study to analyze the effect of various explanations on how human participants perceive the robot.
# Content
This repository contains the data used and obtained in the project.
A parallel repository (https://doi.org/10.5281/zenodo.14883488) contains the code used in the project.
The personal data collected with iCub to
1) retrain the XAE model and improve the real-time addressee estimation accuracy on the robot and
2) test the accuracy of the model in real-time interactions
can be accessed only under Data Access Agreement. For more information please write to alessandra.sciutti@iit.it
The content of the repository follows the structure of the paper.
## A_Design_explainable_AE_model
Files contained in this folder refer to the points a) and b) of the Abstract, that is the design and training of several deep neural networks developed to solve the addressee-direction estimation task.
In this folder, you can find the Checkpoints of the models trained on the Vernissage Dataset (Jayagopi et al., 2012).
Some of the models were trained using the Vernissage Dataset and another set of data collected through the robot cameras in order to improve the performance of the model deployed in the iCub.
These data are personal data, hence protected and accessible only thorugh a Data Sharing Agreement (DSA) with the Italian Institute of Technology (please contact alessandra.sciutti@iit.it for this)
## B_Implementation_in_robotic_architecture
Files contained in this folder refer to the points c) and d) and e) of the Abstract.
The "log_files_interactions" folder contains the log_files obtained from the robotic architecture during the interaction of iCub with human participants in 7 different sessions.
The "analysis" folder contains:
- in "realtime_performance" folder, the .csv files obtain by processing the log info and used to assess the performance of the architecture in addressee estimation using visual info
- in "LLMs" folder, the transcriptions of the conversation and the results of the LLMs performance on the addressee estimation task using textual info
## C_User_evaluation
Files contained in this folder refer to the point f) of the Abstract.
Beyond the video of the experiment used for the online survey video_experiment.mp4 in "video_recording", in this directory there are the files used for the statistical analysis of the data collected through the online user study.
In the "online_user_study" folder:
- raw_data_onlinestudy.csv contains the raw data of the survey
- df_mixed_model.csv is a datasheet of the data processed and ordered for the statistical analysis (mixed effect models)
- Analysis_CVIU.omv is a Jamovi file that containes the analysis and their results.
# Contribution
Iveta Bečková and Štefan Pócoš worked on section A_Design_explainable_AE_model
Carlo Mazzola, Giulia Belgiovine and Omar Eldardeer worked on section B_Implementation_in_robotic_architecture
Carlo Mazzola and Marco Matarese worked on section C_User_evaluation
# License
Creative Commons Attribution 4.0 International (CC-BY-4.0)
https://joinup.ec.europa.eu/licence/creative-commons-attribution-40-international-cc-40
# Citation
Iveta Bečková, Štefan Pócoš, Giulia Belgiovine, Marco Matarese, Omar Eldardeer, Alessandra Sciutti, and Carlo Mazzola. "A multi-modal explainability approach for human-aware robots in multi-party conversation." Computer Vision and Image Understanding, vol. 253, 2025, pp. 104304. doi: 10.1016/j.cviu.2025.104304.
Files
A_Design_explainable_AE_model.zip
Additional details
Related works
- Is compiled by
- Software: 10.5281/zenodo.14883488 (DOI)
- Is published in
- Journal article: 10.1016/j.cviu.2025.104304 (DOI)
- Journal article: 10.48550/arXiv.2407.03340 (DOI)