A systematic review of human-centered explainability in reinforcement learning: transferring the RCC framework to support epistemic trustworthiness
Description
This paper presents a systematic review of explainable reinforcement learning methodologies with an emphasis on human-centered evaluation frameworks. Drawing from literature between 2017 and 2025, we apply and extend the Reasons, Confidence, and Counterfactuals (RCC) framework—originally designed for supervised learning—to reinforcement learning contexts. Our analysis reveals two predominant explanatory strategies: constructive, where explicit explanations are generated, and supportive, where users must infer reasoning from provided visual or textual cues. Our review also emphasizes human factor considerations, like task complexity, explanation formats, and evaluation methodologies. Particularly, for the latter, our analysis shows that improvement of the quality of decision is rarely measured.
Files
      
        Article_Moll_&_Dorsch_A_systematic_review_of_human-centered_explainability_in_reinforcement_learning_AAM.pdf.pdf
        
      
    
    
      
        Files
         (296.1 kB)
        
      
    
    | Name | Size | Download all | 
|---|---|---|
| md5:73c70724adb1267902d3721e761688f1 | 296.1 kB | Preview Download | 
Additional details
Related works
- Is published in
- Journal article: 10.1007/s42454-025-00084-w (DOI)
Funding
Dates
- Accepted
- 
      2025-09-16AAM