XAI-FUNGI: Dataset from the user study on comprehensibility of XAI algorithms

Bobek, Szymon; Korycińska, Paloma; Krakowska, Monika; Mozolewski, Maciej; Rak, Dorota; Zych, Magdalena; Wójcik, Magdalena; Nalepa, Grzegorz J.

doi:10.5281/zenodo.15222484

Published April 15, 2025 | Version 1.0.2

Dataset Open

XAI-FUNGI: Dataset from the user study on comprehensibility of XAI algorithms

1. Jagiellonian University
2. Uniwersytet Jagielloński w Krakowie
3. Uniwersytet Jagiellonski w Krakowie

Contributors

Data collector (5):

Data curator (3):

Editor:

Bobek, Szymon²

1. Uniwersytet Jagielloński w Krakowie
2. Jagiellonian University
3. Uniwersytet Jagiellonski w Krakowie

XAI-FUNGI: Dataset from the user study on comprehensibility of XAI algorithms

We present the dataset which was created during a user study on evaluation of explainability of artificial intelligence (AI) at the Jagielloninan University as a collaborative work of computer science (GEIST team) and information sciences research groups. The main goal of the research was to explore effective explanations of AI model patterns to diverse audiences.

The dataset contains material collected from 39 participants during the interviews conducted by the Information Sciences research group. The participants were recruited from 149 candidates to form three groups that represented domain experts in the field of mycology (DE), students with data science and visualization background (IT) and students from social sciences and humanities (SSH). Each group was given an explanation of a machine learning model trained to predict edible and non-edible mushrooms and asked to interpret the explanations and answer various questions during the interview. The machine learning model and explanations for its decision were prepared by the computer science research team.

The resulting dataset was constructed from the surveys obtained from the candidates, anonymized transcripts of the interviews, the results from thematic analysis, and original explanations with modifications suggested by the participants. The dataset is complemented with the source code allowing one to reproduce the initial machine leaning model and explanations.

The general structure of the dataset is described in the following table. The files that contain in their names [RR]_[SS]_[NN] contain the individual results obtained from particular participant. The meaning of the prefix is as follows:

RR - initials of the researcher conducting the interview,
SS - type of the participant (DE for domain expert, SSH for social sciences and humanities students, or IT for computer science students),
NN - number of the participant

File	Description
SURVEY.csv	The results from a survey that was filled by 149 participants out of which 39 were selected to form a final group of particiapnts.
SURVEY_en.csv	Content of the SURVEY translated into English.
CODEBOOK.csv	The codebook used in thematic analysis and MAXQDA coding
QUESTIONS.csv	List of questions that the participants were asked during interviews.
SLIDES.csv	List of slides used in the study with their interpretation and reference to MAXQDA themes and VISUAL_MODIFICATIONS tables.
MAXQDA_SUMMARY.csv	Summary of thematic analysis performed with codes used in CODEBOOK for each participant
PROBLEMS.csv	List of problems that participants were asked to solve during interviews. They correspond to three instances from the dataset that the participants had to classify using knowledge gained from explanations.
PROBLEMS_en.csv	Content of the PROBLEMS file translated into English.
PROBLEMS_RESPONSES.csv	The responses to the problems for each participant to the problems listed in PROBLEMS.csv
VISUALIZATION_MODIFICATIONS.csv	Information on how the order of the slides was modified by the participant, which slides (explanations) were removed, and what kind of additional explanation was suggested.
ORIGINAL_VISUZALIZATIONS.pdf	The PDF file containing the visualization of explanations presented to the participants during the interviews
ORIGINAL_VISUZALIZATIONS_EN.pdf	Content of the ORIGINAL_VISUZALIZATIONS translated into English.
VISUALIZATION_MODIFICATIONS.zip	The PDF file containing the original slides from ORIGINAL_VISUZALIZATIONS.pdf with the modifications suggested by the participant. Each file is a PDF file named with the participant ID, i.e. [RR]_[SS]_[NN].pdf
TRANSCRIPTS.zip	The anonymized transcripts of interviews for each given participant, zipped into one archive. Each transcript is named after the particiapnt ID, i.e. [RR]_[SS]_[NN].csv and contains text tagged with slide number that it related to, question number from QUESTIONS.csv, and problem number from PROBLEMS.csv.

The detailed structure of the files presented in the previous Table is given in the Technical info section.

The source code used to train ML model and to generate explanations is available on Gitlab

Technical info

Technical Info

The following sections contain descriptions of the records in the particual files. We provide description only to the CSV files, which has a structure. The PDF files content is self explanatory, and does not need additional description.

SURVEY

Column name	Column description
candidate_id	Unique ID of the participant of a survey. Note that this ID is not used further in other files. Instead participant_id should be used to join records from the CSV files.
[set of columns corresponding to survey content]	This columsn are self explanatory, as the title of the colum contain a question the participant was asked and the content the answer given. It also contains some metadata connected with the survey as time spent to fill in the survey by the participant, dates when the survey was filled etc.
participant_id	The participant ID that is an unique identifier of an individual that participated in the interviews. This column should be used to join records form other tables.
comment	Column containing additional comments for participants. In current version this comments are limited to indicators which of the candidates were used as pilot study participants that were nnot included in the final dataset.

CODEBOOK

Column name	Column description
code	The name of a code used in thematic analysis that has the hierarchy encoede within. The > divides hierarchy levels. For instance Aesthetics > layout represent a code that is on the second level of the hierarchy, havind Aesthetics as the parent.
memo	The meaning of a given code.

SLIDES

Column name	Column description
slide_id	Unique ID of a slide that allows to join other tables that uses slide ID with TRANSCRIPTS
maxqda_theme	The name of the explanatation type used in thematic analysis and present in columns of MAXQDA_SUMMARY.csv
slide_name	Slide name used in VISUALIZATION_MODIFICATIONS.csv
comment	The explanation of a content of the slide

MAXQDA_SUMMARY

Columna name	Column description
code	Name of the code, that can be mached with CODEBOOK.csv
[list of columns corresponding to particual type of explanation, e.g., LIME, descriptive statistics, etc.]	Number of occurences of a given code in a given type of explanation
participant_id	ID of the participant for which the summary was prepared

PROBLEMS

Column name	Column description
problem_id	Unique ID of the perticipant that can be usede to join the table with other tables
[set of features for a given mushroom used in particual problem]	Features values of a mushroom to be classified
model_class	Class returned by the machine learnign model
model_probability	Probability assigned by the machine learnign model to the prediction

PROBLEMS_RESPONSES

Column name	Column description
problem_id	Unique ID of the problem that can be used to join the responses with PROBLEMS table and other tables.
participant_id	Unique ID representing given participant that can be sued as a key to join with other tables
prediction_decision	The class assigned by the participand to the given problem
prediction_decision_en	English translation of a prediction decision given by the participant
prediction_certainty	The certainty of the particiapnt decision
prediction_certainty_en	English translation of a prediction certainty given by the participant

VISUALIZATION_MODIFICATIONS

Column name	Column description
participant_id	Unique ID of the problem that can be used to join the table with other tables.
slide_id	ID of a slide used in other tables
original_order	The original order of a given slide in the presentation. In case of custom slides, thet were added to the presentation by the participant, this field is empty.
new_order	The order (place in a presentation) of a slide assigned by the participant.
slide_name	symbolic name of the slide
modification	The type of modification suggested by the participant, where 0 means no modification, 1 removal and 2 addition of custom slide
details	Details o a custom slide added to the presentation. In case of the other slides this field is empty

QUESTIONS

Column name	Column description
question_id	Unique ID of each of the question that allows to match the actual question text with transcripts
realted_slide_id	The slide ID that the question was oroginally assigned to
question	The actual text of the question
question_en	English translation of the question

TRANSCRIPTS

Column name	Column description
speaker_id	ID of a person who is the autorship of the text in the following column. This is basically the distinction between investigator and participant.
slide_id	ID of a slide that matches the ID in SLIDES.csv that the following text is related to. The row in which the slide ID appears is marked as the starting point there this slide is on the participant screen. There are three special slides numbers that identifis the stage of the interview : __S00__ indicates the begining of the core part of the interview with slide analysis, before that the description of the study is introducet to the participant; __S99__ represent the begining of the section where the participants analyze visualization order, the slides ids are not assigned in this section, due to dynamic slide switching by participants; __S15__ represents the end of the slide analysis section, __S88__ represents the begining of the problem solving section .
question_id	ID of a question that can be matched with QUESTIONS.csv. It represent a place, where the question was asked, or where the participant started giving answer to the question. Not all of the questions from QUESTIONS can be matched with transcripts, as some participants did not answer some of the questions, or the tagging of a question answer was to vague.
problem_id	ID of a problem that participants were asked to solve that can be matched with PROBLEMS.csv. The apperance of the ID in a row indicate that from this point in time the participant tried to solve the problem.
text	The anonymized transript of the participant words obtained with MS Teams.

Notes (En)

Release notes

1.0.2 — Typos fixed in the VISUALIZATION_MODIFICATIONS.csv file
1.0.1 — English translations added for Polish-only columns and content (for preview for the international community)
1.0.0 — Original dataset

Files

CODEBOOK.csv

Files (31.7 MB)

Name	Size	Download all
CODEBOOK.csv md5:b90cac76f0e5df3aabcb1478322830b9	13.1 kB	Preview Download
MAXQDA_SUMMARY.csv md5:8cfd50bec23d6459c632cc475eb966ea	174.2 kB	Preview Download
ORIGINAL_VISUALIZATIONS.pdf md5:c2eac1cf62d2325d7af99b05116b00c1	1.3 MB	Preview Download
ORIGINAL_VISUALIZATIONS_EN.pdf md5:2e1d01641b0f4e160944c1a7c5283891	1.3 MB	Preview Download
PROBLEMS.csv md5:78fdd4cb137fb233f6dff966c06bc374	1.1 kB	Preview Download
PROBLEMS_en.csv md5:779a8ae7904f4a8d2cafed812a04bcaa	755 Bytes	Preview Download
PROBLEMS_RESPONSES.csv md5:aa642e278c460e72dbf94033cb4dec71	5.8 kB	Preview Download
QUESTIONS.csv md5:d871a460524b3e223c4adaabc32beed6	6.6 kB	Preview Download
SLIDES.csv md5:a7e07e0c3942cd71057de7e70d4bf4c2	1.8 kB	Preview Download
SURVEY.csv md5:367e558dbb12e48007738fd7a31e4cbf	128.6 kB	Preview Download
SURVEY_en.csv md5:5526cc7acb1245c326bef7085c217dd1	125.8 kB	Preview Download
TRANSCRIPTS.zip md5:83396f076ebe16e00450c0d15e8281d1	633.9 kB	Preview Download
VISUALIZATION_MODIFICATIONS.csv md5:b34d0cf7bb6b5ee40ce6b0271cf11af7	14.1 kB	Preview Download
VISUALIZATION_MODIFICATIONS.zip md5:ee3df0e76526ef14ab385e4b990cdc23	28.0 MB	Preview Download

Additional details

Is described by: Journal article: 10.1038/s41597-025-05167-6 (DOI)
Is source of: Journal article: 10.1016/j.ijhcs.2025.103625 (DOI)

National Science Centre
XPM - Explainable Predictive Maintenance 2020/02/Y/ST6/00070
Ministry of Science and Higher Education
Priority Research Area (DigiWorld) under the Strategic Programme Excellence Initiative at Jagiellonian University. ID.UJ

Repository URL: https://gitlab.geist.re/pro/xai-fungi
Programming language: Python

	All versions	This version
Views	1,396	870
Downloads	3,220	2,022
Data volume	4.9 GB	3.2 GB

Contributors

Data collector (5):

Data curator (3):

Editor:

XAI-FUNGI: Dataset from the user study on comprehensibility of XAI algorithms

CODEBOOK.csv

Files (31.7 MB)

Related works

Funding

Software

XAI-FUNGI: Dataset from the user study on comprehensibility of XAI algorithms

Authors/Creators

Contributors

Data collector (5):

Data curator (3):

Editor:

Description

XAI-FUNGI: Dataset from the user study on comprehensibility of XAI algorithms

Technical info

Technical Info

SURVEY

CODEBOOK

SLIDES

MAXQDA_SUMMARY

PROBLEMS

PROBLEMS_RESPONSES

VISUALIZATION_MODIFICATIONS

QUESTIONS

TRANSCRIPTS

Notes (En)

Files

CODEBOOK.csv

Files (31.7 MB)

Additional details

Related works

Funding

Software