Published October 2, 2025 | Version v1
Journal article Open

A systematic review of human-centered explainability in reinforcement learning: transferring the RCC framework to support epistemic trustworthiness

  • 1. ROR icon Universität der Bundeswehr München
  • 2. ROR icon Czech Academy of Sciences, Institute of Philosophy

Description

This paper presents a systematic review of explainable reinforcement learning methodologies with an emphasis on human-centered evaluation frameworks. Drawing from literature between 2017 and 2025, we apply and extend the Reasons, Confidence, and Counterfactuals (RCC) framework—originally designed for supervised learning—to reinforcement learning contexts. Our analysis reveals two predominant explanatory strategies: constructive, where explicit explanations are generated, and supportive, where users must infer reasoning from provided visual or textual cues. Our review also emphasizes human factor considerations, like task complexity, explanation formats, and evaluation methodologies. Particularly, for the latter, our analysis shows that improvement of the quality of decision is rarely measured.

Files

Article_Moll_&_Dorsch_A_systematic_review_of_human-centered_explainability_in_reinforcement_learning_AAM.pdf.pdf

Additional details

Related works

Is published in
Journal article: 10.1007/s42454-025-00084-w (DOI)

Funding

European Commission
CETE-P - Establishing the Center for Environmental and Technology Ethics - Prague 101086898

Dates

Accepted
2025-09-16
AAM