Published October 8, 2022 | Version v1
Presentation Open

NSDF-Catalog: Toward a Lightweight Indexing Service for the National Science Data Fabric

  • 1. University of Tennessee Knoxville
  • 2. Universit of Utah
  • 3. University of Michigan
  • 4. Sandia National Laboratories
  • 5. SDSC

Description

Across domains massive amounts of scientific data are generated which are useful beyond their original purpose. Yet, discoverability of these data is often hard especially for researchers and students from other domains. As part of the NSF-funded National Science Data Fabric (NSDF) initiative (http://nationalsciencedatafabric.org/) we developed a testbed to demonstrate that these boundaries can be overcome. As part of our effort, we identified the need for indexing large-amounts of scientific data across scientific domains.

Instead of focusing and waiting on the development of a metadata convention across domains, we propose to build a lightweight indexing service with minimal metadata that complements existing domain-specific and rich-metadata collection efforts. The NSDF-Catalog is designed to facilitate multiple related objectives within a flexible microservice to: 1) coordinate data movements and replication of data from origin repositories within the NSDF federation 2) build an inventory of existing scientific data to inform the design of next-generation cyberinfrastructure and 3) provide a suite of tools for discovery of datasets for cross-disciplinary research. Our service indexes at a fine-granularity both at the file or object level to inform data distribution strategies and to improve the experience for users from the consumer perspective, with the goal of allowing end-to-end workflow optimizations.

Files

eScience22_NSDF-Catalog_Toward-Lightweight-Indexing.pdf

Files (2.2 MB)