Published January 1, 2025 | Version v1
Book chapter Open

Navigational Assistance for the Blind in Complex Indoor Spaces Using a Vision-Enabled Large Language Model

Description

This study introduces an innovative implementation of a Large Language Model (LLM) that leverages both vision and natural language processing to enhance navigation for individuals who are blind. Unlike traditional methods that rely on pre-existing maps or environmental reconstruction using sensors like LiDAR, our approach requires no prior environmental data and instead utilizes real-time visual cues similar to human navigation strategies. This novel methodology allows the model to dynamically interpret and verbalize complex indoor environments, providing blind users with descriptive audio cues that effectively convey the spatial layout and pertinent features of their surroundings. Conducted in a hospital setting, our experiments demonstrated that this approach significantly improves GPT4-V's navigation capabilities and offers real-time, contextually relevant guidance, thereby enhancing the independence and safety of blind individuals navigating complex spaces. This research contributes to the understanding of AI's capabilities in real-world applications and opens new avenues for the deployment of language models in complex, dynamic environments.

Files

adewale-2025-navigational.pdf

Files (130.9 kB)

Name Size Download all
md5:79044671967f84383a7a73af6d024f21
130.9 kB Preview Download

Additional details