Replication Package for "The Limits of Reinforcement Learning in Statistical Arbitrage: A Large-Universe Benchmark Against Classical Mean Reversion"

Drakopoulou, Veliota

doi:10.5281/zenodo.20828321

Published June 24, 2026 | Version v4

Dataset Open

Replication Package for "The Limits of Reinforcement Learning in Statistical Arbitrage: A Large-Universe Benchmark Against Classical Mean Reversion"

Drakopoulou, Veliota¹

1. The University of Arizona Global Campus

RL_StatArb_Replication_Russell3000 is the public replication package for the paper:

“The Limits of Reinforcement Learning in Statistical Arbitrage: A Large-Universe Benchmark Against Classical Mean Reversion.”

This package provides the public-safe replication materials for a large-universe statistical-arbitrage study comparing a transparent classical z-score mean-reversion benchmark with four reinforcement-learning agents: Proximal Policy Optimization, Advantage Actor-Critic, Soft Actor-Critic, and Twin Delayed Deep Deterministic Policy Gradient. The study uses a Bloomberg Russell 3000-based equity environment, dynamic pair selection, transaction costs, market-regime variables, and rolling walk-forward validation over 2016–2025. The empirical design compares discrete-action agents, PPO and A2C, with continuous-action agents, SAC and TD3, while also testing a reward-shaped A2C extension designed to reduce excessive exposure. The Russell 3000 data pipeline began with 2,901 exported Bloomberg members, successfully downloaded OHLCV and market-capitalization data for 2,892 securities, and retained 1,269 securities in the filtered tradable universe.

The results show that greater algorithmic flexibility does not automatically improve statistical-arbitrage performance. The z-score benchmark achieves the strongest average risk-adjusted results, producing the highest Sharpe, Sortino, and Calmar ratios across the walk-forward test windows. PPO and A2C learn active trading policies but remain overexposed and fail to outperform the benchmark after transaction costs. Directional reward shaping reduces A2C’s active rate from 96.1% to 33.5% and improves drawdown control, but it does not close the risk-adjusted performance gap. SAC and TD3 also underperform, despite their ability to adjust position size continuously. The package therefore supports replication of the paper’s central finding: standard reinforcement-learning agents require stronger portfolio-level risk control before they can reliably improve on transparent mean-reversion rules.

Full Public-Safe Code Pipeline Included

This replication package includes the full public-safe code pipeline from Bloomberg data collection to model training, comparison, and statistical validation:

01 Build Russell 3000 ticker list from Bloomberg export
02 Download Bloomberg OHLCV and market-cap data
03 Download market-regime data
04 Download Bloomberg metadata
05 Clean/filter stock panel and build tradable universe
06 Dynamic pair selection
07 Build RL pair dataset
08 Train PPO and A2C
09 Train directional reward-shaped A2C
10 Train SAC continuous-action model
11 Train TD3 continuous-action model
12 Compare all algorithms
13 Run statistical tests

The code implements a common empirical design so that all models are evaluated using the same data environment, dynamic pair-selection process, transaction-cost assumption, and walk-forward validation protocol. The methodology compares four model families: the classical z-score benchmark, discrete-action PPO/A2C agents, reward-shaped A2C, and continuous-action SAC/TD3 agents.

Bloomberg Data Restriction Notice

This study uses Bloomberg-derived data, including:

Russell 3000 equity OHLCV data
Market capitalization data
Sector and industry classification data
S&P 500 market data
VIX data
U.S. Treasury yield data
Bloomberg Russell 3000 membership information

Due to Bloomberg and institutional data-licensing restrictions, the raw and processed Bloomberg-derived datasets are not redistributed in this public Zenodo record.

This package does not include:

Raw Bloomberg OHLCV files
Processed Parquet datasets
Russell 3000 membership exports
Bloomberg GICS metadata exports
Market-regime data files derived from Bloomberg
Trained models derived from Bloomberg data
Detailed trade logs
Daily return logs

Researchers with appropriate Bloomberg access may reproduce the dataset by running the provided data-collection and preprocessing scripts in the documented order.

Package Contents

This public replication package includes:

Source code
Run-order documentation
Data-availability statement
Bloomberg data-restriction notice
Aggregate empirical results
All-algorithm comparison tables
Statistical-test summaries
APA-style manuscript tables
Color figures
Figure source data
Replication metadata
Citation file
Requirements file

It is designed to support transparency and reproducibility while respecting Bloomberg data-licensing restrictions.

Creator Metadata

Veliota Drakopoulou
Higher Colleges of Technology, United Arab Emirates
Embry-Riddle Aeronautical University, United States
ORCID: 0000-0002-1670-8033
Contact: vdrakopoulou@hct.ac.ae

Suggested Keywords

Statistical arbitrage
Pairs trading
Reinforcement learning
Mean reversion
Z-score trading
PPO
A2C
SAC
TD3
Reward shaping
Position sizing
Cointegration
Russell 3000
Bloomberg
OHLCV
Walk-forward validation
Algorithmic trading
Intelligent trading systems
Quantitative finance

Files

RL_StatArb_Replication_Russell3000.zip

Files (40.2 kB)

Name	Size	Download all
RL_StatArb_Replication_Russell3000.zip md5:f91e06aae79e367195b645657bbcbfd4	40.2 kB	Preview Download

Additional details

Programming language: Python

	All versions	This version
Views	26	2
Downloads	2	1
Data volume	1.4 GB	40.2 kB

Replication Package for "The Limits of Reinforcement Learning in Statistical Arbitrage: A Large-Universe Benchmark Against Classical Mean Reversion"

Authors/Creators

Description

Full Public-Safe Code Pipeline Included

Bloomberg Data Restriction Notice

Package Contents

Creator Metadata

Suggested Keywords

Files

RL_StatArb_Replication_Russell3000.zip

Files (40.2 kB)

Additional details

Software