Published May 5, 2024 | Version 240505.0
Presentation Open

hostess: Lightweight Distributed Resource Management and Data Processing in Python

  • 1. Million Concepts

Contributors

Contact person:

Project manager:

  • 1. Million Concepts

Description

`hostess` is an open-source library that provides lightweight, Pythonic interfaces for managing and working with distributed resources, system processes, and Python internals. It offers high-level modules for interacting with EC2 instances, S3 buckets, and SSH servers, along with a workflow coordination framework called `station`. It also includes public APIs for these modules' lower-level building blocks, permitting users to manipulate resources in preferred contexts (including locally) and levels of abstraction. `hostess` is designed to fit especially smoothly into data analysis workflows. It provides support for cloud-based data science, including one-line launch of EC2-based Jupyter Notebooks to permit use of S3-based datasets without incurring massive egress fees. It also supports massive cloud-based data processing: a single line of code can launch a distributed workflow across an arbitrary number of EC2 instances (or other servers, local or remote). It has been used to develop tactical processing pipelines for the VIS instrument suite on the VIPER mission, and to process massive NASA data sets from the GALEX and Clementine missions. It is stable, well-documented, and under continuous development. It is available on conda-forge, PyPi, and GitHub (github.com/MillionConcepts/hostess).

Other

In-depth materials for the 2024 Software for the NASA SMD Workshop.

Files

hostess-ec2 example walkthrough.pdf

Files (1.1 MB)

Name Size Download all
md5:eb271712ccb9587fb29d47193eaadecb
495.6 kB Preview Download
md5:117877c7672452dd08e9842e9a7f4b4e
276.9 kB Preview Download
md5:d8002a50075f4f35fcf6c0ba65f42469
114.2 kB Preview Download
md5:da4b16cadaa78e561d95472d571af115
163.5 kB Preview Download