Published May 16, 2026 | Version v1

When Agents Get Lost: Dissecting Failure Modes in Graph-Based Navigation Instruction Evaluation

Description

Vision-and-Language Navigation (VLN) requires agents to interpret natural language instructions for spatial reasoning, yet evaluating instruction quality remains challenging when agents fail. This gap highlights a critical need for a principled understanding of why navigation instructions fail. Addressing this question requires a systematic analysis of failure patterns in spatial reasoning tasks. To address this, we first present a taxonomy of navigation instruction failures that clusters failure cases into four categories: (i) linguistic properties, (ii) topological constraints, (iii) agent limitations, and (iv) execution barriers. We then introduce a dataset of 492 annotated failure navigation traces collected from GROKE, a vision-free evaluation framework that utilizes OpenStreetMap (OSM) data. Our dataset outlines the failure dynamics in spatial grounding to guide the development of better instruction generation, evaluation systems, and navigation agents. Our analysis of failure traces across GROKE demonstrates that agent limitations (74.2%) constitute the dominant error category, with stop-location errors and planning failures as the most frequent subcategories.

The dataset and taxonomy together provide actionable insights that enable instruction generation systems to identify and avoid under-specification patterns while allowing evaluation frameworks to systematically distinguish between instruction quality issues and agent-specific artifacts.

Code: https://fuzsh.github.io/lost/

Files

GeoAI-Paper-3845.pdf

Files (193.1 kB)

Name Size Download all
md5:2b2237c17eb5c0d1545661b5733dc9a8
193.1 kB Preview Download

Additional details

Funding

Research Council of Finland
Knowledgeable and Multimodal Geographic Large Language Models Grounded with Reasoning and Retrieval 368679

Software

Repository URL
https://fuzsh.github.io/lost/
Development Status
Active