A multi-modal explainability approach for human-aware robots in multi-party conversation - Data

Mazzola, Carlo

doi:10.5281/zenodo.14833239

Published February 7, 2025 | Version 1.0

Dataset Open

A multi-modal explainability approach for human-aware robots in multi-party conversation - Data

Mazzola, Carlo (Supervisor)¹

1. Italian Institute of Technology

Contributors

Researchers:

Supervisor:

Sciutti, Alessandra²

1. Comenius University Bratislava
2. Italian Institute of Technology

# Info about the project

## Name and publication

The name of this project stands for Explainable Addressee Estimation. The data contained in this repository were used or obtained in the study
"A Multi-Modal Explainability Approach for Human-Aware Robots in Multi-Party Conversation", published in Computer Vision and Image Understanding for publication https://doi.org/10.48550/arXiv.2407.03340

Reference:
Iveta Bečková, Štefan Pócoš, Giulia Belgiovine, Marco Matarese, Omar Eldardeer, Alessandra Sciutti, and Carlo Mazzola. "A multi-modal explainability approach for human-aware robots in multi-party conversation." Computer Vision and Image Understanding, vol. 253, 2025, pp. 104304. doi: 10.1016/j.cviu.2025.104304.

The project builds upon a previous study, by Mazzola et al. (2023) doi: https://doi.org/10.1109/IJCNN54540.2023.10191452

## Description

### Highlights
- Improvement of the state-of-the-art performance on the Vernissage Addressee Estimation dataset.
- Design of an inherently explainable architecture for Addressee Estimation.
- Integration and validation of the explainable model into a humanoid robot iCub.
- Transparent architecture for human-activity recognition in multi-party conversation.
- User study to analyze how human participants perceive the quality of explanations.

### Abstract

The addressee estimation (understanding to whom somebody is talking) is a fundamental task for human activity recognition
in multi-party conversation scenarios. Specifically, in the field of human–robot interaction, it becomes even more crucial
to enable social robots to participate in such interactive contexts. However, it is usually implemented as a binary classification task,
restricting the robot’s capability to estimate whether it was addressed or not, which limits its interactive skills.
For a social robot to gain the trust of humans, it is also important to manifest a certain level of transparency and explainability.
Explainable artificial intelligence thus plays a significant role in the current machine learning applications and models,
to provide explanations for their decisions besides excellent performance.
In our work, we (a) present an addressee estimation model with improved performance in comparison with the previous state-of-the-art;
(b) further modify this model to include inherently explainable attention-based segments;
(c) implement the explainable addressee estimation as part of a modular cognitive architecture for multi-party conversation in an iCub robot;
(d) validate the real-time performance of the explainable model in multi-party human–robot interaction;
(e) propose several ways to incorporate explainability and transparency in the aforementioned architecture;
and(f) perform an online user study to analyze the effect of various explanations on how human participants perceive the robot.

# Content

This repository contains the data used and obtained in the project.
A parallel repository (https://doi.org/10.5281/zenodo.14883488) contains the code used in the project.
The personal data collected with iCub to
1) retrain the XAE model and improve the real-time addressee estimation accuracy on the robot and
2) test the accuracy of the model in real-time interactions
can be accessed only under Data Access Agreement. For more information please write to alessandra.sciutti@iit.it

The content of the repository follows the structure of the paper.

## A_Design_explainable_AE_model

Files contained in this folder refer to the points a) and b) of the Abstract, that is the design and training of several deep neural networks developed to solve the addressee-direction estimation task.
In this folder, you can find the Checkpoints of the models trained on the Vernissage Dataset (Jayagopi et al., 2012).

Some of the models were trained using the Vernissage Dataset and another set of data collected through the robot cameras in order to improve the performance of the model deployed in the iCub.
These data are personal data, hence protected and accessible only thorugh a Data Sharing Agreement (DSA) with the Italian Institute of Technology (please contact alessandra.sciutti@iit.it for this)

## B_Implementation_in_robotic_architecture

Files contained in this folder refer to the points c) and d) and e) of the Abstract.
The "log_files_interactions" folder contains the log_files obtained from the robotic architecture during the interaction of iCub with human participants in 7 different sessions.
The "analysis" folder contains:
- in "realtime_performance" folder, the .csv files obtain by processing the log info and used to assess the performance of the architecture in addressee estimation using visual info
- in "LLMs" folder, the transcriptions of the conversation and the results of the LLMs performance on the addressee estimation task using textual info

## C_User_evaluation

Files contained in this folder refer to the point f) of the Abstract.
Beyond the video of the experiment used for the online survey video_experiment.mp4 in "video_recording", in this directory there are the files used for the statistical analysis of the data collected through the online user study.
In the "online_user_study" folder:
- raw_data_onlinestudy.csv contains the raw data of the survey
- df_mixed_model.csv is a datasheet of the data processed and ordered for the statistical analysis (mixed effect models)
- Analysis_CVIU.omv is a Jamovi file that containes the analysis and their results.

# Contribution

Iveta Bečková and Štefan Pócoš worked on section A_Design_explainable_AE_model
Carlo Mazzola, Giulia Belgiovine and Omar Eldardeer worked on section B_Implementation_in_robotic_architecture
Carlo Mazzola and Marco Matarese worked on section C_User_evaluation

# License
Creative Commons Attribution 4.0 International (CC-BY-4.0)
https://joinup.ec.europa.eu/licence/creative-commons-attribution-40-international-cc-40

# Citation

Iveta Bečková, Štefan Pócoš, Giulia Belgiovine, Marco Matarese, Omar Eldardeer, Alessandra Sciutti, and Carlo Mazzola. "A multi-modal explainability approach for human-aware robots in multi-party conversation." Computer Vision and Image Understanding, vol. 253, 2025, pp. 104304. doi: 10.1016/j.cviu.2025.104304.

Files

A_Design_explainable_AE_model.zip

Files (158.7 MB)

Name	Size	Download all
A_Design_explainable_AE_model.zip md5:c8bd57322d8eb7342c4dd4365f465093	26.5 MB	Preview Download
B_Implementation_in_robotic_architecture.zip md5:4ba2b6aef9b0f3b88bb9418c7596b2d0	691.9 kB	Preview Download
C_User_evaluation.zip md5:8706d723019b7a6edc441fd7ae3a05bd	131.5 MB	Preview Download
README.md md5:5117d1c90ac10e0d4ae4a96476e006dc	6.4 kB	Preview Download

Additional details

Is compiled by: Software: 10.5281/zenodo.14883488 (DOI)
Is published in: Journal article: 10.1016/j.cviu.2025.104304 (DOI); Journal article: 10.48550/arXiv.2407.03340 (DOI)

European Commission
TERAIS - Towards Excellent Robotics and Artificial Intelligence at a Slovak university 101079338
Slovak Research and Development Agency
APVV-21-0105

	All versions	This version
Views	27	27
Downloads	60	60
Data volume	2.4 GB	2.4 GB

A multi-modal explainability approach for human-aware robots in multi-party conversation - Data

Contributors

Researchers:

Supervisor:

Files

A_Design_explainable_AE_model.zip

Files (158.7 MB)

Additional details

Related works

Funding

A multi-modal explainability approach for human-aware robots in multi-party conversation - Data

Creators

Contributors

Researchers:

Supervisor:

Description

Files

A_Design_explainable_AE_model.zip

Files (158.7 MB)

Additional details

Related works

Funding