When Agents Get Lost: Dissecting Failure Modes in Graph-Based Navigation Instruction Evaluation
Authors/Creators
Description
Vision-and-Language Navigation (VLN) requires agents to interpret natural language instructions for spatial reasoning, yet evaluating instruction quality remains challenging when agents fail. This gap highlights a critical need for a principled understanding of why navigation instructions fail. Addressing this question requires a systematic analysis of failure patterns in spatial reasoning tasks. To address this, we first present a taxonomy of navigation instruction failures that clusters failure cases into four categories: (i) linguistic properties, (ii) topological constraints, (iii) agent limitations, and (iv) execution barriers. We then introduce a dataset of 492 annotated failure navigation traces collected from GROKE, a vision-free evaluation framework that utilizes OpenStreetMap (OSM) data. Our dataset outlines the failure dynamics in spatial grounding to guide the development of better instruction generation, evaluation systems, and navigation agents. Our analysis of failure traces across GROKE demonstrates that agent limitations (74.2%) constitute the dominant error category, with stop-location errors and planning failures as the most frequent subcategories.
The dataset and taxonomy together provide actionable insights that enable instruction generation systems to identify and avoid under-specification patterns while allowing evaluation frameworks to systematically distinguish between instruction quality issues and agent-specific artifacts.
Code: https://fuzsh.github.io/lost/
Files
GeoAI-Paper-3845.pdf
Files
(193.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:2b2237c17eb5c0d1545661b5733dc9a8
|
193.1 kB | Preview Download |
Additional details
Funding
- Research Council of Finland
- Knowledgeable and Multimodal Geographic Large Language Models Grounded with Reasoning and Retrieval 368679
Software
- Repository URL
- https://fuzsh.github.io/lost/
- Development Status
- Active