Published January 1, 2026
| Version v1
Journal article
Open
Design Of Reinforcement Learning Grid World Navigation System Using Rewards And Penalties: Q-Learning, SARSA And Double Q-Learning
Authors/Creators
Description
This paper presents a systematic comparative study of three tabular reinforcement learning (RL) algorithms—Q-Learning,State-Action-Reward-State- Action (SARSA), and Double Q-Learning—deployed within a configurable stochastic GridWorld environment. The environment incorporates slip-based stochastic transitions, trap cells, potential-based reward shaping grounded in the theoretical guarantees of Ng et al. [1], and partial observability modes. The central research hypothesis investigates whether Double Q-Learning's decoupled selection-evaluation mechanism demonstrably reduces maximization bias compared to vanilla Q-Learning, particularly under elevated stochastic transition probabilities. An interactive web-based research platform is developed using Flask and Chart.js, enabling real-time policy visualization, value-function heatmaps, Q-table analysis, and multi-seed benchmark comparisons with confidence intervals. Experimental results across three canonical grid configurations demonstrate that Double Q- Learning achieves superior convergence stability and reduced overestimation in high-slip environments, while SARSA exhibits inherently conservative on-policy behavior that trades off peak performance for robustness near traps.
Files
IJSRET_V12_issue2_501.pdf
Files
(397.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:87828d851042aed9705c34c9c21d7b01
|
397.5 kB | Preview Download |
Additional details
Related works
- Has part
- Journal article: https://ijsret.com/wp-content/uploads/IJSRET_V12_issue2_501.pdf (URL)
- Is identical to
- Journal article: https://ijsret.com/2026/04/27/design-of-reinforcement-learning-grid-world-navigation-system-using-rewards-and-penalties-q-learning-sarsa-and-double-q-learning/ (URL)