Published August 4, 2025
| Version v1
Conference paper
Open
An Ontology-Based Text Annotation Tool for NFDI4Energy
Contributors
Editor (2):
- 1. Nationale Forschungsdateninfrastruktur (NFDI) e.V.
- 2. University of Amsterdam
Description
In order to manage research data in accordance with the FAIR principles, it must be annotated using standardised terminologies – e.g. keywords found in the metadata of scientific articles. In NFDI4Energy, we are working on the Open Energy Ontology to meet this challenge. However the manual annotation of text-based research data (such as reports) with this ontologies is time-consuming. Moreover, the person performing the annotation must have a good understanding of the structure and use of the ontology in order to carry out the annotation (does the person labelling know all the elements of the ontology and their definitions?). This paper presents an ontology-based annotation tool that uses the NFDI4Energy Terminology Service Collection [1] to highlight relevant terms in text, linking unstructured data to structured knowledge. The OEO and other ontologies in the collection are accessible via this service. Several ontology-based annotation tools already exist. For example, ZOOMA from EBI [2] links terms to biomedical ontologies using data sources and ontologies, but it requires input text in a specific format, it detects only one match per line, and does not highlight terms in context. The BioPortal Annotator represents a more comprehensive solution in the biomedical domain. It performs direct annotation using large curated dictionaries (from UMLS and NCBO ontologies) and applies semantic expansion techniques based on ontology hierarchies and cross-ontology mappings[3]. Building on these ideas, the NFDI4Energy Annotator is tailored for the energy domain. It supports direct annotation on free text and highlights matched terms. While the current approach is simple, future enhancements will include context-aware annotation based on semantic meaning and customizable options for users to control the number and level of annotations based on the ontology hierarchy. To detect relevant terms despite synonyms or varied phrasing, the tool uses fuzzy matching based on Levenshtein distance [4], enabling approximate matches through character-level comparisons. Ontology data is retrieved from the Technische Informationsbibliothek (TIB) NFDI4Energy Terminology Services (TS) via the API [5] and stored in a database for efficient querying. Preprocessing involves tokenization and n-gram generation. The backend, deployed in Docker containers, includes a term database and the NFDI4Energy Annotator, which handles text processing and annotation. The frontend offers a web-based interface that allows users to input text, select relevant ontologies, and visualize annotated terms within the context. Highlighted annotations help users intuitively explore ontology-linked terms. To support integration into research workflows, the tool provides export options in CSV and Excel formats. Currently, annotation quality is evaluated through manual review. Future plans include integrating AI-based methods—such as transformer models—to better capture the semantics of terms in context, and implementing systematic evaluation metrics. We also plan to add user-controlled annotation granularity, allowing users to choose between specific ontology terms or more general parent concepts. By enhancing semantic text processing and linking unstructured text to structured knowledge, the tool supports improved information retrieval, data interoperability, and knowledge discovery in the energy sector.
Files
CoRDI_2025_paper_58.pdf
Files
(187.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:c2eff9f05f64fcc8453a748ee48c6e88
|
187.5 kB | Preview Download |