Published June 4, 2026 | Version 1.0
Poster Open

Enhancing Spatial Data Discovery in Spatial Data Infrastructures (SDIs) using AI

  • 1. EDMO icon 52°North Spatial Information Research GmbH
  • 2. ROR icon 52°North Spatial Information Research
  • 3. 52°North Initiative for Geospatial Open Source Software GmbH

Description

Discovering relevant datasets within Spatial Data Infrastructures (SDIs) is often hindered by rigid keyword-based searches, language barriers, and poor handling of geographic relationships. While Large Language Models (LLMs) offer strong natural language and multilingual processing capabilities, they suffer from hallucinations and lack the spatial reasoning needed to match natural language place names with dataset bounding-box coordinates. This poster proposes a hybrid workflow that integrates Retrieval-Augmented Generation (RAG) with geocoding and spatial indexing to overcome these limitations. The system uses an LLM parser to extract core terms, themes, and locations from natural language user queries. If a location is detected, it is geocoded into bounding box coordinates and filtered via a spatial index (R-Trees in a PostGIS database). The filtered datasets then undergo a semantic search using text embeddings (derived from DCAT metadata titles, descriptions, and keywords) stored in a vector database. Finally, the retrieved datasets are re-ranked, and the LLM generates a verifiable response in the user's original language. The proposed solution significantly lowers technical barriers for non-expert users by allowing natural language and place-name-based queries. Grounding the LLM with actual metadata effectively mitigates hallucinations and enables cross-lingual discovery without explicit translation.

Files

James Ondieki, Matthes Rieke, Simon Jirka - Enhancing Spatial Data Discovery in Spatial Data Infrastructures (SDIs) using AI.pdf