XAI-FUNGI: Dataset from the user study on comprehensibility of XAI algorithms
Creators
- 1. Jagiellonian University
- 2. Uniwersytet Jagielloński w Krakowie
- 3. Uniwersytet Jagiellonski w Krakowie
Contributors
Data collectors:
Data curators:
Editor:
- 1. Uniwersytet Jagielloński w Krakowie
- 2. Jagiellonian University
- 3. Uniwersytet Jagiellonski w Krakowie
Description
XAI-FUNGI: Dataset from the user study on comprehensibility of XAI algorithms
We present the dataset which was created during a user study on evaluation of explainability of artificial intelligence (AI) at the Jagielloninan University as a collaborative work of computer science (GEIST team) and information sciences research groups. The main goal of the research was to explore effective explanations of AI model patterns to diverse audiences.
The dataset contains material collected from 39 participants during the interviews conducted by the Information Sciences research group. The participants were recruited from 149 candidates to form three groups that represented domain experts in the field of mycology (DE), students with data science and visualization background (IT) and students from social sciences and humanities (SSH). Each group was given an explanation of a machine learning model trained to predict edible and non-edible mushrooms and asked to interpret the explanations and answer various questions during the interview. The machine learning model and explanations for its decision were prepared by the computer science research team.
The resulting dataset was constructed from the surveys obtained from the candidates, anonymized transcripts of the interviews, the results from thematic analysis, and original explanations with modifications suggested by the participants. The dataset is complemented with the source code allowing one to reproduce the initial machine leaning model and explanations.
The general structure of the dataset is described in the following table. The files that contain in their names [RR]_[SS]_[NN] contain the individual results obtained from particular participant. The meaning of the prefix is as follows:
- RR - initials of the researcher conducting the interview,
- SS - type of the participant (DE for domain expert, SSH for social sciences and humanities students, or IT for computer science students),
- NN - number of the participant
| File | Description |
| SURVEY.csv | The results from a survey that was filled by 149 participants out of which 39 were selected to form a final group of particiapnts. |
| SURVEY_en.csv | Content of the SURVEY translated into English. |
| CODEBOOK.csv | The codebook used in thematic analysis and MAXQDA coding |
| QUESTIONS.csv | List of questions that the participants were asked during interviews. |
| SLIDES.csv | List of slides used in the study with their interpretation and reference to MAXQDA themes and VISUAL_MODIFICATIONS tables. |
| MAXQDA_SUMMARY.csv | Summary of thematic analysis performed with codes used in CODEBOOK for each participant |
| PROBLEMS.csv | List of problems that participants were asked to solve during interviews. They correspond to three instances from the dataset that the participants had to classify using knowledge gained from explanations. |
| PROBLEMS_en.csv | Content of the PROBLEMS file translated into English. |
| PROBLEMS_RESPONSES.csv | The responses to the problems for each participant to the problems listed in PROBLEMS.csv |
| VISUALIZATION_MODIFICATIONS.csv | Information on how the order of the slides was modified by the participant, which slides (explanations) were removed, and what kind of additional explanation was suggested. |
| ORIGINAL_VISUZALIZATIONS.pdf | The PDF file containing the visualization of explanations presented to the participants during the interviews |
| ORIGINAL_VISUZALIZATIONS_EN.pdf | Content of the ORIGINAL_VISUZALIZATIONS translated into English. |
| VISUALIZATION_MODIFICATIONS.zip | The PDF file containing the original slides from ORIGINAL_VISUZALIZATIONS.pdf with the modifications suggested by the participant. Each file is a PDF file named with the participant ID, i.e. [RR]_[SS]_[NN].pdf |
| TRANSCRIPTS.zip | The anonymized transcripts of interviews for each given participant, zipped into one archive. Each transcript is named after the particiapnt ID, i.e. [RR]_[SS]_[NN].csv and contains text tagged with slide number that it related to, question number from QUESTIONS.csv, and problem number from PROBLEMS.csv. |
The detailed structure of the files presented in the previous Table is given in the Technical info section.
The source code used to train ML model and to generate explanations is available on Gitlab
Technical info
Technical Info
The following sections contain descriptions of the records in the particual files. We provide description only to the CSV files, which has a structure. The PDF files content is self explanatory, and does not need additional description.
SURVEY
| Column name | Column description |
| candidate_id | Unique ID of the participant of a survey. Note that this ID is not used further in other files. Instead participant_id should be used to join records from the CSV files. |
| [set of columns corresponding to survey content] | This columsn are self explanatory, as the title of the colum contain a question the participant was asked and the content the answer given. It also contains some metadata connected with the survey as time spent to fill in the survey by the participant, dates when the survey was filled etc. |
| participant_id | The participant ID that is an unique identifier of an individual that participated in the interviews. This column should be used to join records form other tables. |
| comment | Column containing additional comments for participants. In current version this comments are limited to indicators which of the candidates were used as pilot study participants that were nnot included in the final dataset. |
CODEBOOK
| Column name | Column description |
| code | The name of a code used in thematic analysis that has the hierarchy encoede within. The > divides hierarchy levels. For instance Aesthetics > layout represent a code that is on the second level of the hierarchy, havind Aesthetics as the parent. |
| memo | The meaning of a given code. |
SLIDES
| Column name | Column description |
| slide_id | Unique ID of a slide that allows to join other tables that uses slide ID with TRANSCRIPTS |
| maxqda_theme | The name of the explanatation type used in thematic analysis and present in columns of MAXQDA_SUMMARY.csv |
| slide_name | Slide name used in VISUALIZATION_MODIFICATIONS.csv |
| comment | The explanation of a content of the slide |
MAXQDA_SUMMARY
| Columna name | Column description |
| code | Name of the code, that can be mached with CODEBOOK.csv |
| [list of columns corresponding to particual type of explanation, e.g., LIME, descriptive statistics, etc.] | Number of occurences of a given code in a given type of explanation |
| participant_id | ID of the participant for which the summary was prepared |
PROBLEMS
| Column name | Column description |
| problem_id | Unique ID of the perticipant that can be usede to join the table with other tables |
| [set of features for a given mushroom used in particual problem] | Features values of a mushroom to be classified |
| model_class | Class returned by the machine learnign model |
| model_probability | Probability assigned by the machine learnign model to the prediction |
PROBLEMS_RESPONSES
| Column name | Column description |
| problem_id | Unique ID of the problem that can be used to join the responses with PROBLEMS table and other tables. |
| participant_id | Unique ID representing given participant that can be sued as a key to join with other tables |
| prediction_decision | The class assigned by the participand to the given problem |
| prediction_decision_en | English translation of a prediction decision given by the participant |
| prediction_certainty | The certainty of the particiapnt decision |
| prediction_certainty_en | English translation of a prediction certainty given by the participant |
VISUALIZATION_MODIFICATIONS
| Column name | Column description |
| participant_id | Unique ID of the problem that can be used to join the table with other tables. |
| slide_id | ID of a slide used in other tables |
| original_order | The original order of a given slide in the presentation. In case of custom slides, thet were added to the presentation by the participant, this field is empty. |
| new_order | The order (place in a presentation) of a slide assigned by the participant. |
| slide_name | symbolic name of the slide |
| modification | The type of modification suggested by the participant, where 0 means no modification, 1 removal and 2 addition of custom slide |
| details | Details o a custom slide added to the presentation. In case of the other slides this field is empty |
QUESTIONS
| Column name | Column description |
| question_id | Unique ID of each of the question that allows to match the actual question text with transcripts |
| realted_slide_id | The slide ID that the question was oroginally assigned to |
| question | The actual text of the question |
| question_en | English translation of the question |
TRANSCRIPTS
| Column name | Column description |
| speaker_id | ID of a person who is the autorship of the text in the following column. This is basically the distinction between investigator and participant. |
| slide_id | ID of a slide that matches the ID in SLIDES.csv that the following text is related to. The row in which the slide ID appears is marked as the starting point there this slide is on the participant screen. There are three special slides numbers that identifis the stage of the interview : __S00__ indicates the begining of the core part of the interview with slide analysis, before that the description of the study is introducet to the participant; __S99__ represent the begining of the section where the participants analyze visualization order, the slides ids are not assigned in this section, due to dynamic slide switching by participants; __S15__ represents the end of the slide analysis section, __S88__ represents the begining of the problem solving section . |
| question_id | ID of a question that can be matched with QUESTIONS.csv. It represent a place, where the question was asked, or where the participant started giving answer to the question. Not all of the questions from QUESTIONS can be matched with transcripts, as some participants did not answer some of the questions, or the tagging of a question answer was to vague. |
| problem_id | ID of a problem that participants were asked to solve that can be matched with PROBLEMS.csv. The apperance of the ID in a row indicate that from this point in time the participant tried to solve the problem. |
| text | The anonymized transript of the participant words obtained with MS Teams. |
Notes (En)
Files
CODEBOOK.csv
Files
(31.7 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:b90cac76f0e5df3aabcb1478322830b9
|
13.1 kB | Preview Download |
|
md5:8cfd50bec23d6459c632cc475eb966ea
|
174.2 kB | Preview Download |
|
md5:c2eac1cf62d2325d7af99b05116b00c1
|
1.3 MB | Preview Download |
|
md5:2e1d01641b0f4e160944c1a7c5283891
|
1.3 MB | Preview Download |
|
md5:78fdd4cb137fb233f6dff966c06bc374
|
1.1 kB | Preview Download |
|
md5:779a8ae7904f4a8d2cafed812a04bcaa
|
755 Bytes | Preview Download |
|
md5:aa642e278c460e72dbf94033cb4dec71
|
5.8 kB | Preview Download |
|
md5:d871a460524b3e223c4adaabc32beed6
|
6.6 kB | Preview Download |
|
md5:a7e07e0c3942cd71057de7e70d4bf4c2
|
1.8 kB | Preview Download |
|
md5:367e558dbb12e48007738fd7a31e4cbf
|
128.6 kB | Preview Download |
|
md5:5526cc7acb1245c326bef7085c217dd1
|
125.8 kB | Preview Download |
|
md5:83396f076ebe16e00450c0d15e8281d1
|
633.9 kB | Preview Download |
|
md5:b34d0cf7bb6b5ee40ce6b0271cf11af7
|
14.1 kB | Preview Download |
|
md5:ee3df0e76526ef14ab385e4b990cdc23
|
28.0 MB | Preview Download |
Additional details
Related works
- Is described by
- Journal article: 10.1038/s41597-025-05167-6 (DOI)
Funding
Software
- Repository URL
- https://gitlab.geist.re/pro/xai-fungi
- Programming language
- Python