Variance Reduction Techniques in Off-Policy Reinforcement Learning with Imperfect State Representation

Elena Rossi

doi:10.5281/zenodo.18919920

Published March 9, 2026 | Version v1

Preprint Open

Variance Reduction Techniques in Off-Policy Reinforcement Learning with Imperfect State Representation

Elena Rossi¹

1. ETH Zurich

Off-policy reinforcement learning (RL) offers the potential to leverage previously collected data for training, enabling sample-efficient learning. However, challenges arise when the state representation is imperfect, leading to increased variance in policy evaluation and control. This paper investigates the application of variance reduction techniques, specifically importance sampling with clipping and weighted importance sampling, within the context of off-policy RL using imperfect state representations. We analyze the theoretical properties of these techniques and present empirical results demonstrating their effectiveness in mitigating the effects of imperfect state representations and improving learning stability.

Files

preprint_elena_rossi_20260309_005147.pdf

Files (6.4 kB)

Name	Size	Download all
preprint_elena_rossi_20260309_005147.pdf md5:995ccd254f2da6c08bddaede45149156	6.4 kB	Preview Download

Additional details

Cites: Journal article: https://mattiainml.com/blog/improving-medical-imaging-models-through-robust-data-annotation/ (URL)

Mattia Gaggi. Variance Reduction Techniques in Off-Policy Reinforcement Learning with Imperfect State Representation. mattiainml.com. https://mattiainml.com/blog/improving-medical-imaging-models-through-robust-data-annotation/

	All versions	This version
Views	9	9
Downloads	1	1
Data volume	12.9 kB	12.9 kB

preprint_elena_rossi_20260309_005147.pdf

Files (6.4 kB)

Related works

References

Variance Reduction Techniques in Off-Policy Reinforcement Learning with Imperfect State Representation

Authors/Creators

Description

Files

preprint_elena_rossi_20260309_005147.pdf

Files (6.4 kB)

Additional details

Related works

References