An Adaptable Indexing Pipeline for Enriching Meta Information of Datasets from Heterogeneous Repositories
Description
Dataset repositories publish a significant number of datasets
continuously within the context of a variety of domains, such as biodiversity
and oceanography. To conduct multidisciplinary research, scientists
and practitioners must discover datasets from various disciplines unfamiliar
with them. Well-known search engines, such as Google dataset and
Mendeley data, try to support researchers with cross-domain dataset
discovery based on their contents. However, as datasets typically contain
scientific observations or collected data from service providers, their
contextual information is limited. Accordingly, effective dataset indexing
can be impossible to increase the Findability, Accessibility, Interoperability,
and Reusability (FAIRness) based on their contextual information.
This paper presents an indexing pipeline to extend contextual information
of datasets based on their scientific domains by using topic modeling
and a set of suggested rules and domain keywords (such as essential variables
in environment science) based on domain experts’ suggestions. The
pipeline relies on an open ecosystem, where dataset providers publish
semantically enhanced metadata on their data repositories. We aggregate,
normalize, and reconcile such metadata, providing a dataset search
engine that enables research communities to find, access, integrate, and
reuse datasets. We evaluated our approach on a manually created gold
standard and a user study.
Files
2022.conference.akdd.caera.pdf
Files
(294.5 kB)
Name | Size | Download all |
---|---|---|
md5:54a487ac165b3f6ea22f4c9b2c12ca0d
|
294.5 kB | Preview Download |
Additional details
Funding
- ARTICONF – smART socIal media eCOsytstem in a blockchaiN Federated environment 825134
- European Commission
- Blue Cloud – Blue-Cloud: Piloting innovative services for Marine Research & the Blue Economy 862409
- European Commission
- ENVRI-FAIR – ENVironmental Research Infrastructures building Fair services Accessible for society, Innovation and Research 824068
- European Commission