Published June 8, 2022 | Version v1
Presentation Open

Who is Afraid of a Petabyte Dataset? Rethinking Repository Infrastructures and Curation Workflows for the Scale and Type of Next Generation Data

  • 1. Oak Ridge National Laboratory

Description

As the US Department of Energy’s largest multi-program laboratory, Oak Ridge National Laboratory (ORNL) researchers are solving some of our world’s greatest challenges across a variety of domains. ORNL has world-leading facilities for neutron scattering, material and nanoscale research, additive manufacturing, high-performance computing, and several others. Embarking on a new initiative to connect instrument to compute, and with the advancement of machine learning (ML) and robotic automation, there are some key challenges for storing, sharing, and preserving these massive datasets. Constellation (https://doi.ccs.ornl.gov), our public data repository located at the Oak Ridge Leadership Facility (OLCF), is undergoing a redesign to address the next generation challenges of curating, sharing, and preserving petabyte files. In our presentation, we will discuss the challenges that we are facing and our proposed three-pronged approach: i) creatively repurposing existing tools and community development for automated curation; ii) identifying data “tiers” or stages based on preservation and discoverability needs; and iii) the creation of a domain-specific knowledge portal.

Files

OR_Presentation_A_May.pptx.pdf

Files (4.7 MB)

Name Size Download all
md5:49fc6164f0912622f7ff7b4ef92bf8e7
4.7 MB Preview Download