Published May 20, 2025 | Version v1
Thesis Open

Non-Stationary Markov Decision Processes and Continual Reinforcement Learning Algorithms

  • 1. ROR icon Indian Institute of Science Education and Research Berhampur

Description

Non-stationarity is a key challenge in sequential decision-making problems, affecting how learning algorithms adapt over time. Modern sequential decision-making algorithms do not fully account for non-stationarity in Markov Decision Processes (MDPs), even though it is a common challenge in real-world problems. In this thesis, we study and devise methods based on gradient alignment and sampled replay memory to learn an optimal policy over a sequence of MDPs where the components of MDPs may change over time t. A primary challenge in Continual reinforcement learning is to mitigate a well-known issue called ”catastrophic forgetting”. Given a sequence of MDPs and corresponding optimal policies catastrophic forgetting can be thought of as a decrease in expected return when learning policy at time t. In this regard, we aim to develop MDP algorithms to address this issue while ensuring fast adaptation from previous task to current task in Reinforcement Learning (RL). We build on existing classes of methods that reduce gradient interference and ensure gradient alignment during the learning process so as to avoid catastrophic forgetting while initializing the next task with good inductive priors for fast adaptation.

Files

Sandesh_Katakam_MS_Thesis.pdf

Files (2.3 MB)

Name Size Download all
md5:2ff7319f83e27d2e51ec3af2ee26c539
2.3 MB Preview Download