Non-Stationary Markov Decision Processes and Continual Reinforcement Learning Algorithms

Katakam, Sandesh

doi:10.5281/zenodo.15471128

Published May 20, 2025 | Version v1

Thesis Open

Non-Stationary Markov Decision Processes and Continual Reinforcement Learning Algorithms

Katakam, Sandesh (Researcher)¹

1. Indian Institute of Science Education and Research Berhampur

Non-stationarity is a key challenge in sequential decision-making problems, affecting how learning algorithms adapt over time. Modern sequential decision-making algorithms do not fully account for non-stationarity in Markov Decision Processes (MDPs), even though it is a common challenge in real-world problems. In this thesis, we study and devise methods based on gradient alignment and sampled replay memory to learn an optimal policy over a sequence of MDPs where the components of MDPs may change over time t. A primary challenge in Continual reinforcement learning is to mitigate a well-known issue called ”catastrophic forgetting”. Given a sequence of MDPs and corresponding optimal policies catastrophic forgetting can be thought of as a decrease in expected return when learning policy at time t. In this regard, we aim to develop MDP algorithms to address this issue while ensuring fast adaptation from previous task to current task in Reinforcement Learning (RL). We build on existing classes of methods that reduce gradient interference and ensure gradient alignment during the learning process so as to avoid catastrophic forgetting while initializing the next task with good inductive priors for fast adaptation.

Files

Sandesh_Katakam_MS_Thesis.pdf

Files (2.3 MB)

Name	Size	Download all
Sandesh_Katakam_MS_Thesis.pdf md5:2ff7319f83e27d2e51ec3af2ee26c539	2.3 MB	Preview Download

	All versions	This version
Views	69	69
Downloads	54	54
Data volume	143.9 MB	143.9 MB

Non-Stationary Markov Decision Processes and Continual Reinforcement Learning Algorithms

Authors/Creators

Description

Files

Sandesh_Katakam_MS_Thesis.pdf

Files (2.3 MB)