Published March 9, 2026 | Version v1
Preprint Open

Variance Reduction Techniques in Off-Policy Reinforcement Learning with Imperfect State Representation

Authors/Creators

  • 1. ETH Zurich

Description

Off-policy reinforcement learning (RL) offers the potential to leverage previously collected data for training, enabling sample-efficient learning. However, challenges arise when the state representation is imperfect, leading to increased variance in policy evaluation and control. This paper investigates the application of variance reduction techniques, specifically importance sampling with clipping and weighted importance sampling, within the context of off-policy RL using imperfect state representations. We analyze the theoretical properties of these techniques and present empirical results demonstrating their effectiveness in mitigating the effects of imperfect state representations and improving learning stability.

Files

preprint_elena_rossi_20260309_005147.pdf

Files (6.4 kB)

Name Size Download all
md5:995ccd254f2da6c08bddaede45149156
6.4 kB Preview Download

Additional details

References