MASHUP: Making Serverless Computing Useful for HPC Workflows

doi:10.5281/zenodo.5733395

Published November 28, 2021 | Version v1

Journal article Open

MASHUP: Making Serverless Computing Useful for HPC Workflows

1. Northeastern University

MASHUP: Making Serverless Computing Useful for HPC Workflows.

Description

MASHUP executes HPC/scientific workflows in a hybrid execution environment consisting of VM nodes and a serverless platform. For every task in the Directed Acyclic Graph (DAG) of a workflow, MASHUP decides which execution environment (VM nodes versus serverless platform) is suitable for each task. Accordingly, it executes the tasks to reduce execution time and expense incurred to the end-user for running the workflow.

Evaluated Benchmarks

MASHUP is evaluated with the HPC workflows: 1000Genome, SRAsearch, and Epigenomics. These workflows are widely used by the HPC community for benchmarking purposes. They represent significant variation in terms of compute characteristics and have all significant components of scientific workflows.

Implementation Details

MASHUP is implemented in Python3.6 and is easily portable to use with multiple cloud providers. As a dependency, it only requires the command line interface (CLI) of the respective cloud provider to be installed and configured with the user account credentials. MASHUP sets up VM nodes according to the user’s chosen VM family and number of nodes. Then, MASHUP sets up the tasks of the workflow in the cluster as well as in the cloud provider's serverless platform. It then configures a remote storage to be used for exchanging data among multiple tasks running in different platforms and loads the initial input data in it. The Placement Decision Controller (PDC) of MASHUP spawns the tasks of the workflow once on the serverless platform and once on the VM cluster and decides the optimal execution environment for each of the tasks. Then accordingly, MASHUP spawns the tasks of the workflow in the most optimal execution environment, following the precedence pattern of the workflow DAG. Mashup also takes certain steps like having redundant remote storage and VMs which can control the execution of the workflow, in the event of a failure of the master node.

Workflow Setup Procedure

Workflows are individually set up in a VM cluster and a serverless platform for their hybrid execution.
(1) Set up AWS CLI access keys and permissions:

    pip3 install boto3
    pip3 install awscli
    aws configure

(2) Set up AWS execution role:
Set up an execution role from AWS dashboard with full access of Lambda to S3.
(3) Set up workflows on AWS Lambda (serverless platform)
cd ./serverless_build/<workflow_name>/<task_name>/ aws lambda create-function --function-name <task_name> --role <lambda_s3_role_id> --handler test.main --memory-size 2048 --runtime python3.6 --timeout 900 --zip-file fileb://test.zip

Inside each workflow_name directory under serverless_build, the individual directory for all the tasks of each of the workflows are provided. These directories contain the pre-compiled binaries for all the tasks inside the test.zip file. The above command ships the files of all the tasks to AWS Lambda serverless platform for deployment. For the <task_name>, provide the name of the task and for <lambda_s3_role_id> provide the execution role id created in step (2). After deployment, the input files of the tasks are stored in a S3 bucket.
(4) Set up workflows on AWS EC2. (serverless platform):
cd ./vm_build/<workflow_name>/ scp -r ./* ec2-user@<master-node-ip>:/home/user/
Set up the EC2 VMs with the required number of nodes. Use Amazon Linux 2 or any version of Ubuntu as OS. This transfers all the precompiled binaries and input data of the tasks in a workflow. The is the public ip address of the master node of the VM cluster.

Run MASHUP

python3 ./<workflow_name>/test.py

This command requires only one user input: inside the "instance_id" variable of test.py, write the Instance ID of the master node of the VM cluster. This ID is available on the AWS dashboard of EC2. This command runs the entire workflow in 2 phases. The first phase is the Placement Decision Controller (PDC) phase, where MASHUP decides the placement decision (VM node versus serverless platform) for each of the tasks. MASHUP runs the workflow in the second phase with the decision made in the first phase.

Directory and File Structure

The project contains the following directory structure:
./<workflow_name>/exp1/: experiments with r5 family of VM instances
./<workflow_name>/exp2/ : experiments with m5 family of VM instances
./<workflow_name>/exp3/ : experiments with r5b family of VM instances
./<workflow_name>/<exp#>/<instance_type> : contain results/experimental data for a different number of nodes (from 2 to 96 nodes).
./<workflow_name>/<exp#>/<instance_type>/hybrid : Contains the result of MASHUP without PDC (tasks with components which have more number of components than the number of nodes are executed in serverless, rest in VM cluster).
./<workflow_name>/<exp#>/<instance_type>/hybrid+ : Results of MASHUP without PDC having 2 separate clusters independent of one another.
./<workflow_name>/<exp#>/<instance_type>/only_ec2 : Results of execution of all the tasks in a workflow in EC2 VM cluster.
./<workflow_name>/<exp#>/<instance_type>/technique : Results of MASHUP with PDC.
./<workflow_name>/<exp#>/<instance_type>/technique+ : Results of MASHUP with PDC on 2 separate clusters independent of one another.
./<workflow_name>/<exp#>/only_lambda : Results of executing all the tasks of the workflow in serverless.
./<workflow_name>/profiling/<task_name> : Contains the CPU, I/O, and memory performance profiling data of the tasks.
./figures : Contains the result figures used in the paper.

Each of the innermost directories contains the following files:
cost.txt : Expense of execution on VM cluster and on serverless platform individually.
runtime_per_phase.txt : Execution time of each of the phases of the workflow.
<task_name>_response.txt : This contains the output from a serverless execution. It contains the execution time, read time, and write time for each of the serverless functions.
<task_name>_output.txt : This contains the output from a VM cluster execution. It contains the execution time, read time, and write time for the task.

MASHUP Evaluation Platform

We use Amazon Web Service (AWS) EC2 VMs to create VM cluster and AWS Lambdas for a serverless execution of different tasks in a HPC workflow. We evaluate MASHUP with different kinds of EC2 VM families like r5b.large, r5.large, and m5.large. Our AWS Lambdas have a typical memory size of 3 GB each. For our experimentation, we vary the number of nodes of VM cluster from 2 to 96. For storage and communication between AWS Lambdas, we use AWS S3 buckets.

Evaluation Metrics

We compare the performance of MASHUP and all competing techniques in terms of improvement in workflow execution time and execution expense over a traditional VM cluster-based execution.

Files

MashUP.zip

Files (61.0 MB)

Name	Size	Download all
MashUP.zip md5:b9e96748b36c7916b93c0df286b69499	61.0 MB	Preview Download

	All versions	This version
Views	703	588
Downloads	124	106
Data volume	8.5 GB	7.2 GB

MASHUP: Making Serverless Computing Useful for HPC Workflows

Creators

Description

Files

MashUP.zip

Files (61.0 MB)