Published October 28, 2024 | Version v1
Presentation Open

Building a composable stack for research cyberinfrastructure

  • 1. ROR icon National Center for Supercomputing Applications

Description

Small to medium-sized research projects require increasingly sophisticated software stacks as the demand continues to grow for more high performance computing (HPC) resources and Kuberenetes clusters for web-based applications. Frequently these smaller projects do not have funding for dedicated DevOps engineers, and require their RSEs to perform the task of dedicated DevOps engineers. The effort required to manually provision each layer of this stack, from cluster node operating system configuration to application deployment, especially given the scarcity of RSEs, will become infeasible without force multiplying innovations. Often these tasks are done early in the project, and need to be re-learned for the next project. Additionally, the wealth of knowledge from the DevOps engineer, securing these systems and upgrading them during the project will fall on the RSE, reducing the often scarce time to develop the application even more.

We present the approach developed at NCSA to address this problem: a GitOps-based method of bootstrapping virtual computing resources and Kubernetes clusters for composable deployment of collaborative tools and services. Leveraging industry-standard software solutions we provide a free and open source foundation upon which open science can flourish, with an emphasis on decentralized applications and protocols where possible. Leveraging this infrastructure, we can add new layers on this called DecentCI [1], allowing an RSE to quickly get a complex system up and running, allowing for shared access to data, sharing ideas in forums, private messaging, websites, etc. 

Building on the knowledge gained from many projects, we have created a set of recipes allowing for a new project to be up and running in under 30 minutes. For example in the case of kubernetes, nodes will be created and configured, and clusters will be initialized with ingress controllers, secret management, storage classes etc (all of this is configurable on a per cluster basis). The clusters deployed can easily be upgraded by applying newer centrally managed modules in these clusters. New functionality added centrally can be added over time to the clusters.

During this talk we will discuss what tools are used and are centrally managed, and what tools are installed in each cluster. We will describe how an RSE can add their applications to the system and use well understood GIT workflows to deploy new applications, and work with other RSE on the project. The end goal is a system that will be decentralized and empower the RSE to get new applications to the scientists faster and securely  to help with their research.

References

  1. T. Andrew Manning, “Embracing the (re)decentralized web for sustainable research collaboration cyberinfrastructure”, US-RSE 2023       

 

Files

20241017 US-RSE Conference presentation.pdf

Files (1.4 MB)

Name Size Download all
md5:306e142918e18b1b434b21920989903d
1.4 MB Preview Download

Additional details

Funding

U.S. National Science Foundation
Frameworks: MUSES, Modular Unified Solver of the Equation of State 2103680
U.S. National Science Foundation
Frameworks: SCiMMA: Real-time Orchestration of Multi-Messenger Astrophysical Observations 2311355
National Institute of Standards and Technology
Center of Excellence & IN-CORE 70NANB15H044
U.S. National Science Foundation
Collaborative Research: CSSI: Framework: Data: Clowder Open Source Customizable Research Data Management, Plus-Plus 1835834

Dates

Created
2024-06-18

Software

Repository URL
https://github.com/ncsa/radiant-cluster-template
Programming language
HCL, YAML
Development Status
Active

References

  • T. Andrew Manning, "Embracing the (re)decentralized web for sustainable research collaboration cyberinfrastructure", US-RSE 2023