Building a composable stack for research cyberinfrastructure
Description
Small to medium-sized research projects require increasingly sophisticated software stacks as the demand continues to grow for more high performance computing (HPC) resources and Kuberenetes clusters for web-based applications. Frequently these smaller projects do not have funding for dedicated DevOps engineers, and require their RSEs to perform the task of dedicated DevOps engineers. The effort required to manually provision each layer of this stack, from cluster node operating system configuration to application deployment, especially given the scarcity of RSEs, will become infeasible without force multiplying innovations. Often these tasks are done early in the project, and need to be re-learned for the next project. Additionally, the wealth of knowledge from the DevOps engineer, securing these systems and upgrading them during the project will fall on the RSE, reducing the often scarce time to develop the application even more.
We present the approach developed at NCSA to address this problem: a GitOps-based method of bootstrapping virtual computing resources and Kubernetes clusters for composable deployment of collaborative tools and services. Leveraging industry-standard software solutions we provide a free and open source foundation upon which open science can flourish, with an emphasis on decentralized applications and protocols where possible. Leveraging this infrastructure, we can add new layers on this called DecentCI [1], allowing an RSE to quickly get a complex system up and running, allowing for shared access to data, sharing ideas in forums, private messaging, websites, etc.
Building on the knowledge gained from many projects, we have created a set of recipes allowing for a new project to be up and running in under 30 minutes. For example in the case of kubernetes, nodes will be created and configured, and clusters will be initialized with ingress controllers, secret management, storage classes etc (all of this is configurable on a per cluster basis). The clusters deployed can easily be upgraded by applying newer centrally managed modules in these clusters. New functionality added centrally can be added over time to the clusters.
During this talk we will discuss what tools are used and are centrally managed, and what tools are installed in each cluster. We will describe how an RSE can add their applications to the system and use well understood GIT workflows to deploy new applications, and work with other RSE on the project. The end goal is a system that will be decentralized and empower the RSE to get new applications to the scientists faster and securely to help with their research.
References
-
T. Andrew Manning, “Embracing the (re)decentralized web for sustainable research collaboration cyberinfrastructure”, US-RSE 2023
Files
20241017 US-RSE Conference presentation.pdf
Files
(1.4 MB)
Name | Size | Download all |
---|---|---|
md5:306e142918e18b1b434b21920989903d
|
1.4 MB | Preview Download |
Additional details
Funding
- U.S. National Science Foundation
- Frameworks: MUSES, Modular Unified Solver of the Equation of State 2103680
- U.S. National Science Foundation
- Frameworks: SCiMMA: Real-time Orchestration of Multi-Messenger Astrophysical Observations 2311355
- National Institute of Standards and Technology
- Center of Excellence & IN-CORE 70NANB15H044
- U.S. National Science Foundation
- Collaborative Research: CSSI: Framework: Data: Clowder Open Source Customizable Research Data Management, Plus-Plus 1835834
Dates
- Created
-
2024-06-18
Software
- Repository URL
- https://github.com/ncsa/radiant-cluster-template
- Programming language
- HCL, YAML
- Development Status
- Active
References
- T. Andrew Manning, "Embracing the (re)decentralized web for sustainable research collaboration cyberinfrastructure", US-RSE 2023