Multi-Agent Explainable Trading System (MAETS): A Cooperative Deep Reinforcement Learning Framework for Transparent and Risk-Aware Automated Portfolio Management
Description
Abstract
Automated portfolio management via deep reinforcement learning (DRL) has demonstrated competitive risk-adjusted returns in controlled backtesting environments, yet its deployment in regulated financial institutions is impeded by two intertwined deficiencies: the opacity of monolithic neural policies and the inadequacy of existing multi-agent coordination mechanisms for capturing the heterogeneous reasoning processes that underpin professional trading decisions. This paper introduces the Multi-Agent Explainable Trading System (MAETS), a cooperative multi-agent reinforcement learning (MARL) framework comprising four domain-specialized agents—a Fundamental Analysis Agent (FAA), a Technical Analysis Agent (TAA), a Sentiment Analysis Agent (SAA), and a Risk Management Agent (RMA)—coordinated through a Graph Attention Network (GAT)-based centralized critic operating under the Centralized Training with Decentralized Execution (CTDE) paradigm. Each agent's policy is parameterized by a Proximal Policy Optimization (PPO) backbone with multi-head cross-attention over learned inter-agent message embeddings. Post-hoc explainability is provided through a three-stage pipeline: KernelSHAP attribution, counterfactual perturbation, and a FinBERT-conditioned natural language generation module. We also introduce the Fidelity-Completeness-Understandability (FCU) composite metric as a principled measure for evaluating the quality of AI-generated financial explanations. Backtested on a decade of data (2014–2023) from S&P 500, NIFTY 50, and CSI 300 constituents under realistic transaction cost and slippage assumptions, MAETS achieves an annualized return of 31.6% (95% CI: ±0.9%), a Sharpe Ratio of 1.74, a Calmar Ratio of 2.82, and a maximum drawdown of 11.2%—outperforming the strongest DRL baseline by 7.5 Sharpe points and 6.1 percentage points in annualized return. The FCU score of 0.91 represents a 68.5% improvement over post-hoc-attributed single-agent alternatives, with analyst understandability ratings averaging 4.5/5.0. These results establish that trading transparency and financial performance are not competing objectives but can be jointly optimized through principled multi-agent decomposition.
Index Terms— Multi-agent reinforcement learning, explainable artificial intelligence, algorithmic trading, proximal policy optimization, SHAP attribution, cooperative agents, portfolio optimization, centralized training with decentralized execution.
Files
Files
(49.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:882328e18919dc476b363b010de70d94
|
49.7 kB | Download |
Additional details
Dates
- Submitted
-
2026-04-01Manuscript submitted to IEEE TNNLS