AI4DiTraRe: Towards LLM-Based Information Extraction for Standardising Climate Research Repositories
Creators
Description
In the petabyte-era of climate research, harmonising diverse environmental and geoscientific datasets is critical to improve data interoperability and support effectiveness of interdisciplinary studies. This paper presents an idea of designing an LLM-based tool to extract and standardize metadata from climate research repositories. The solution leverages the adaptability of LLMs that are able to understand contextual nuances. By addressing common inconsistencies such as varying parameters (observation types), units, and definitions, the proposed tool will significantly improve effective data integration. It will be the first step to facilitate the creation of a unified metadata schema adhering to the FAIR principles.
Abstract
In the petabyte-era of climate research, harmonising diverse environmental and geoscientific datasets is critical to improve data interoperability and support effectiveness of interdisciplinary studies. This paper presents an idea of designing an LLM-based tool to extract and standardize metadata from climate research repositories. The solution leverages the adaptability of LLMs that are able to understand contextual nuances. By addressing common inconsistencies such as varying parameters (observation types), units, and definitions, the proposed tool will significantly improve effective data integration. It will be the first step to facilitate the creation of a unified metadata schema adhering to the FAIR principles.
Series information
This position paper was accepted for publication in the First AAAI Bridge on Artificial Intelligence for Scholarly Communication AI4SC, 25-26 February 2025 - Philadelphia, Pennsylvania, USA; co-located with the 39th AAAI Conference on Artificial Intelligence (AAAI-25).
Technical info
This short publication consists of two pages of main body together with two pages of references and an appendix.
Files
AI4SC_2025_ISE_contribution.pdf
Files
(206.8 kB)
Name | Size | Download all |
---|---|---|
md5:4d503533035cf3c7a9f05db1b65f36c2
|
206.8 kB | Preview Download |
Additional details
Related works
- Continues
- Proposal: 10.5281/zenodo.11109405 (DOI)
- Is described by
- Presentation: 10.5281/zenodo.14925185 (DOI)
Dates
- Accepted
-
2025-02-04Accepted for publication in the bridge.