Design Of Reinforcement Learning Grid World Navigation System Using Rewards And Penalties: Q-Learning, SARSA And Double Q-Learning

Prachi Durge; Mahek Shribas; Mohanish Lanjewar; Parth Gadwal; Pranay Wadibhasme; Pranjali Nakhate

doi:10.5281/zenodo.19830361

Published January 1, 2026 | Version v1

Journal article Open

Design Of Reinforcement Learning Grid World Navigation System Using Rewards And Penalties: Q-Learning, SARSA And Double Q-Learning

This paper presents a systematic comparative study of three tabular reinforcement learning (RL) algorithms—Q-Learning,State-Action-Reward-State- Action (SARSA), and Double Q-Learning—deployed within a configurable stochastic GridWorld environment. The environment incorporates slip-based stochastic transitions, trap cells, potential-based reward shaping grounded in the theoretical guarantees of Ng et al. [1], and partial observability modes. The central research hypothesis investigates whether Double Q-Learning's decoupled selection-evaluation mechanism demonstrably reduces maximization bias compared to vanilla Q-Learning, particularly under elevated stochastic transition probabilities. An interactive web-based research platform is developed using Flask and Chart.js, enabling real-time policy visualization, value-function heatmaps, Q-table analysis, and multi-seed benchmark comparisons with confidence intervals. Experimental results across three canonical grid configurations demonstrate that Double Q- Learning achieves superior convergence stability and reduced overestimation in high-slip environments, while SARSA exhibits inherently conservative on-policy behavior that trades off peak performance for robustness near traps.

Files

IJSRET_V12_issue2_501.pdf

Files (397.5 kB)

Name	Size	Download all
IJSRET_V12_issue2_501.pdf md5:87828d851042aed9705c34c9c21d7b01	397.5 kB	Preview Download

Additional details

Has part: Journal article: https://ijsret.com/wp-content/uploads/IJSRET_V12_issue2_501.pdf (URL)
Is identical to: Journal article: https://ijsret.com/2026/04/27/design-of-reinforcement-learning-grid-world-navigation-system-using-rewards-and-penalties-q-learning-sarsa-and-double-q-learning/ (URL)

	All versions	This version
Views	75	75
Downloads	2	2
Data volume	1.2 MB	1.2 MB

Design Of Reinforcement Learning Grid World Navigation System Using Rewards And Penalties: Q-Learning, SARSA And Double Q-Learning

Authors/Creators

Description

Files

IJSRET_V12_issue2_501.pdf

Files (397.5 kB)

Additional details

Related works