Non-Stationary Markov Decision Processes and Continual Reinforcement Learning Algorithms
Authors/Creators
Description
Non-stationarity is a key challenge in sequential decision-making problems, affecting how learning algorithms adapt over time. Modern sequential decision-making algorithms do not fully account for non-stationarity in Markov Decision Processes (MDPs), even though it is a common challenge in real-world problems. In this thesis, we study and devise methods based on gradient alignment and sampled replay memory to learn an optimal policy over a sequence of MDPs where the components of MDPs may change over time t. A primary challenge in Continual reinforcement learning is to mitigate a well-known issue called ”catastrophic forgetting”. Given a sequence of MDPs and corresponding optimal policies catastrophic forgetting can be thought of as a decrease in expected return when learning policy at time t. In this regard, we aim to develop MDP algorithms to address this issue while ensuring fast adaptation from previous task to current task in Reinforcement Learning (RL). We build on existing classes of methods that reduce gradient interference and ensure gradient alignment during the learning process so as to avoid catastrophic forgetting while initializing the next task with good inductive priors for fast adaptation.
Files
Sandesh_Katakam_MS_Thesis.pdf
Files
(2.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:2ff7319f83e27d2e51ec3af2ee26c539
|
2.3 MB | Preview Download |