Published August 4, 2025 | Version v1
Conference paper Open

KONDA: An LLM-based Tool for Semantic Annotation and Knowledge Graph Creation Using Ontologies for Research Data

  • 1. RWTH Aachen University
  • 1. Nationale Forschungsdateninfrastruktur (NFDI) e.V.
  • 2. University of Amsterdam

Description

Achieving semantic interoperability of research data is key to enabling cross-domain data integration, reuse, and knowledge discovery [1]. While the need to align heterogeneous datasets using shared vocabularies and ontologies is widely recognized, doing so remains a considerable challenge in practice [2], [3]. Researchers face several challenges: • (C1) Lack of expertise in ontologies: Many researchers are unfamiliar with ontology engineering and semantic annotation. • (C2) Absence of established domain ontologies: While some domains, such as medicine, have well-established vocabularies, other domains, such as production engineering, may lack suitable or widely adopted ontologies, making it difficult to identify reusable options. • (C3) Technical barriers: The knowledge required to work with technologies such as RDF or mapping tools often presents an entry barrier. • (C4) Tool heterogeneity: Working with multiple disconnected tools adds cognitive and technical overhead. • (C5) Limited resources: Researchers typically face time constraints, making it difficult to invest in familiarizing themselves with complex tools or processes. • (C6) Proprietary solutions: Many semantic mapping tools (e.g., Talend [4]) are proprietary and not suitable for scientific work. To address these challenges, we present KONDA, an LLM-based tool that supports semantic enrichment of research datasets and the construction of explorable knowledge graphs within a single integrated workflow. The KONDA workflow is as follows: • An interface prompts the user to upload their research dataset, along with optional supplementary documents (e.g., protocols, DMPs, README files) to provide the tool with context. • The user is supported in the selection of suitable ontologies via a direct integration with the TIB Terminology Service [5], with the option to add custom ontologies. • The tool performs automated LLM-based semantic annotation of the dataset using the provided context and selected ontologies. A feedback screen enables the user to review and correct annotations. • The annotated data is provided in RDF format with an immediate visualization as a knowledge graph. KONDA's architecture comprises a user interface, a server backend managing sessions and data processing, and an API layer that connects the tool to an LLM, where the semantic enrichment is conducted with techniques such as named entity recognition, relation extraction, and ontology-based annotation. Through KONDA, a guided, interactive tool is provided in which users receive LLM-assisted suggestions and the opportunity to intuitively explore their enriched data directly through automated knowledge graph creation, thus reducing required technical or formal training in semantic technologies (C1, C3). The discovery of reusable ontologies is enabled through the integration of terminology services (C2). KONDA unifies the pipeline within a single, cohesive environment (C4). The tool's semi-automated workflow provides fast and visually supported results with minimal manual effort (C5) while retaining opportunities for human feedback to ensure output quality. Finally, KONDA's modular backend supports the deployment of both proprietary and open LLMs (C6). KONDA empowers researchers to semantically enrich their datasets with minimal effort, offering an integrated and adaptable solution. Future development will focus on persistent graph storage, automated ontology recommendations, and evaluation in real-world settings. By leveraging LLMs and emphasizing usability, KONDA provides a robust foundation for advancing data interoperability across disciplines.

Files

CoRDI_2025_paper_149.pdf

Files (213.4 kB)

Name Size Download all
md5:7379777dce3ced416ccc5d990b86610d
213.4 kB Preview Download