Published February 4, 2025 | Version Camera-ready
Conference paper Open

AI4DiTraRe: Towards LLM-Based Information Extraction for Standardising Climate Research Repositories

Description

In the petabyte-era of climate research, harmonising diverse environmental and geoscientific datasets is critical to improve data interoperability and support effectiveness of interdisciplinary studies. This paper presents an idea of designing an LLM-based tool to extract and standardize metadata from climate research repositories. The solution leverages the adaptability of LLMs that are able to understand contextual nuances. By addressing common inconsistencies such as varying parameters (observation types), units, and definitions, the proposed tool will significantly improve effective data integration. It will be the first step to facilitate the creation of a unified metadata schema adhering to the FAIR principles.

Abstract

In the petabyte-era of climate research, harmonising diverse environmental and geoscientific datasets is critical to improve data interoperability and support effectiveness of interdisciplinary studies. This paper presents an idea of designing an LLM-based tool to extract and standardize metadata from climate research repositories. The solution leverages the adaptability of LLMs that are able to understand contextual nuances. By addressing common inconsistencies such as varying parameters (observation types), units, and definitions, the proposed tool will significantly improve effective data integration. It will be the first step to facilitate the creation of a unified metadata schema adhering to the FAIR principles.

Series information

This position paper was accepted for publication in the First AAAI Bridge on Artificial Intelligence for Scholarly Communication AI4SC, 25-26 February 2025 - Philadelphia, Pennsylvania, USA; co-located with the 39th AAAI Conference on Artificial Intelligence (AAAI-25).

Technical info

This short publication consists of two pages of main body together with two pages of references and an appendix.

Files

AI4SC_2025_ISE_contribution.pdf

Files (206.8 kB)

Name Size Download all
md5:4d503533035cf3c7a9f05db1b65f36c2
206.8 kB Preview Download

Additional details

Related works

Continues
Proposal: 10.5281/zenodo.11109405 (DOI)
Is described by
Presentation: 10.5281/zenodo.14925185 (DOI)

Funding

Leibniz Association
Leibniz Science Campus "Digital Transformation of Research" W74/2022

Dates

Accepted
2025-02-04
Accepted for publication in the bridge.