hostess: Lightweight Distributed Resource Management and Data Processing in Python
Contributors
Contact person:
Project manager:
Researcher:
- 1. Million Concepts
Description
`hostess` is an open-source library that provides lightweight, Pythonic interfaces for managing and working with distributed resources, system processes, and Python internals. It offers high-level modules for interacting with EC2 instances, S3 buckets, and SSH servers, along with a workflow coordination framework called `station`. It also includes public APIs for these modules' lower-level building blocks, permitting users to manipulate resources in preferred contexts (including locally) and levels of abstraction. `hostess` is designed to fit especially smoothly into data analysis workflows. It provides support for cloud-based data science, including one-line launch of EC2-based Jupyter Notebooks to permit use of S3-based datasets without incurring massive egress fees. It also supports massive cloud-based data processing: a single line of code can launch a distributed workflow across an arbitrary number of EC2 instances (or other servers, local or remote). It has been used to develop tactical processing pipelines for the VIS instrument suite on the VIPER mission, and to process massive NASA data sets from the GALEX and Clementine missions. It is stable, well-documented, and under continuous development. It is available on conda-forge, PyPi, and GitHub (github.com/MillionConcepts/hostess).
Other
In-depth materials for the 2024 Software for the NASA SMD Workshop.