# JPL setup for AWS EC2 instances

(valid as of 2024-05-10) 

If you are based at JPL and setting up an AWS EC2 instance, there are some steps you need to take to successfully set up the instance to comply with JPL's security requirements and enable `ssh` access on your instance (which is no longer enabled by default). Please follow these steps in place of Steps 2 and 3 of the [AWS Cloud: getting started](https://ecco-v4-python-tutorial.readthedocs.io/AWS_Cloud_getting_started.html) for general users.

## Step 2: Start a JPL EC2 instance

As a JPL user, more than likely you will be added as a user to an existing project AWS account rather than creating a new account. The account owner will need to [add you](https://wiki.jpl.nasa.gov/display/cloudcomputing/Granting+access+to+the+AWS+console+to+other+JPL+users) to the account as a `power_user`. Make sure the account owner/manager is OK with your use of it, as you *will* incur costs running your EC2 instance (free tier accounts do not have sufficient memory for JPL EC2 instances).

Once you are a user on a JPL AWS account, make sure you are connected to the JPL network (with VPN if not at the lab), and go to the JPL AWS console [sign-in page](https://sso3.jpl.nasa.gov/awsconsole). Then bookmark the sign-in page, as you will be using it again and it is not the easiest to find. Once you have signed in, you should be at a screen with the title Console Home. First, let's make sure you are in the most optimal AWS "region" for accessing PO.DAAC datasets, which are hosted in region *us-west-2 (Oregon)*. In the upper-right corner of the page just to the left of your username, there is a drop-down menu with a place name on it. Select the **US West (Oregon)    us-west-2** region.

Now let's start a new EC2 instance. We will need to do this using an Amazon Machine Image (AMI) generated by the JPL Cloud Computing Team (see [here](https://wiki.jpl.nasa.gov/display/cloudcomputing/OS+Pipeline) for more info). In the AWS console, click on **Services** in the upper-left corner next to the AWS logo, then **Compute** --> **EC2**, then from the menu on the left **Images** --> **AMIs**. A list of JPL-specific AMIs should appear on the screen (if not make sure **Private images** is selected as a filter on the top left). It is recommended to use a recently-generated JPL AMI, as these AMIs are automatically deprecated after 2 years. Use the arrows next to **AMI name** or **Creation date** to see the newest AMIs first. Select an AMI and click **Launch instance from AMI** in the upper-right corner. There are some settings on this screen to configure before launching the new instance:

*Name and tags*: Whatever you want (e.g., ECCO tutorials).

*Application and OS images (Amazon Machine Image)*: Leave unchanged.

*Instance type*: **t2.medium**/**t3.medium** or larger is recommended, and probably necessary to run a JPL-based EC2 instance successfully. (**t3** is a newer generation, with similar or slightly cheaper costs as **t2**.)

*Key pair (login)*: Click on **Create new key pair**. In the pop-up window, make the name whatever you want (e.g., aws_ec2_jupyter), select *Key pair type*: **RSA** and *Private key file format*: **.pem**, then **Create key pair**. This downloads the key file to your Downloads folder, and you should move it to your `.ssh` folder: `mv ~/Downloads/aws_ec2_jupyter.pem ~/.ssh/`. Then change the permissions to read-only for the file owner `chmod 400 ~/.ssh/aws_ec2_jupyter.pem`.

*Network settings*: Look at **Select existing security group** to see if you can use a security group that has VPC: vpc-0161fa19cefbd9635. If not, you can try **Create security group** and make sure that the boxes for allowing SSH, HTTPS, and HTTP traffic are checked. If you have issues launching or accessing your instance, you may need to consult with another JPL user or submit a ticket to [CloudHelp](https://goto.jpl.nasa.gov/cloudhelp).

*Configure storage*: Specify a storage volume with at least **16 GiB gp3** as your root volume. This is important, since the python/conda installation with the packages we need will occupy ~7.5 GB, and we need some workspace as a buffer. If you get an error message about having too little storage when you launch your instance, you need to edit your instance config to have at least the minimum amount of storage for the AMI you are using.

*Advanced details*: You need to include an IAM profile with your instance. Check the *IAM instance profile* dropdown menu to see if there is one associated with your security group (might have a title like **SRV-standard-instance-profile**). If you can not select an IAM profile, check with other account users or [CloudHelp](https://goto.jpl.nasa.gov/cloudhelp).

Finally, at the bottom-right of the page click the yellow **Launch instance** button. Wait a minute or two for the instance to initialize; you can check the **Instances** screen accessed from the menu on the left side to see that your Instance state is **Running**.

### Step 3a: Enable ssh access

JPL does not enable `ssh` access to AWS instances by default, instead preferring [SSM Agent](https://docs.aws.amazon.com/systems-manager/latest/userguide/ssm-agent.html). However, most users will be much more familiar with `ssh`, and it is easier to transfer files to/from your instance with `ssh`. You can enable `ssh` using these steps:

- *Connect using SSM in the browser*: Go to **EC2** --> **Instances** and click on the instance ID of the new instance. Then click **Connect** in the upper-right part of the page. There are a few options for connecting, select **Session Manager** and then click the yellow **Connect** button. If you can not connect and/or see an error message, you might need to wait several minutes for the session to be fully established. A tab or window should open in your browser with a terminal window on the instance.

- *Initial set up and download GitHub repository*: Copy the following commands and paste in your SSM window (using shift-insert or right-click then **Paste**):

```
$ cd ~ && sudo dnf update -y && sudo dnf install git -y
$ sudo su jpluser
$ cd ~/
$ mkdir git_repos
$ cd git_repos
$ git clone https://github.com/ECCO-GROUP/ECCO-v4-Python-Tutorial.git
```

- *Enable ssh access*: There is a script in the GitHub repository to enable ssh access `sshd_enable.sh`. You want to run it as the *root* user, otherwise you will not have the necessary permissions. Again, copy and paste the following in your SSM window:

```
$ sudo ~/git_repos/ECCO-v4-Python-Tutorial/Cloud_Setup/sshd_enable.sh
```

The script will ask if you want to move the git repo and change its ownership. Answer **Y** and enter **jpluser** for user name.

Once the script is completed, you should be able to ssh into your new instance. You can **Terminate** the SSM window. Then from your machine's terminal window, connect to the instance's *private* IPv4 address (given on the AWS instance summary page) with user name **jpluser**. For example, if the key file is `~/.ssh/aws_ec2_jupyter.pem` and the private IPv4 address is 100.104.70.37, then:

```
$ ssh -i "~/.ssh/aws_ec2_jupyter.pem" jpluser@100.104.70.37 -L 9889:localhost:9889
```

The `-L` option indicates a tunnel from the local machine's port 9889 to the instance's port 9889; this will be used later to open Jupyterlab through your local machine's web browser.

### Step 3b: Set up conda environment

Now you need to install software (conda/miniconda/miniforge) to run Python, and then install Python packages and the Jupyter interface to run these tutorial notebooks. A shell script to expedite this process `jupyter_env_setup.sh` is provided on the tutorial Github page. This script handles most of our environment setup, by doing the following:

1. Installing `wget` (which allows us to download from internet websites)

1. Installing `tmux` (which allows us to persist tasks on a remote machine even when disconnected).

1. Downloading `Miniforge.sh` from *conda-forge* which enables us to install `conda` and `mamba` (a faster, C-based `conda`)

1. Creating a new conda environment called `jupyter` that will contain the packages we need to run the notebooks.

1. Installing Python packages using a combination of `mamba` and `pip` (the latter works better when memory is limited).

1. Querying the user for their NASA Earthdata username and password (if these are already archived in a `~/.netrc` file this step is skipped).

To run `jupyter_env_setup.sh`, copy, paste, and execute the following two commands on the instance:

```
$ sudo chmod 755 ~/git_repos/ECCO-v4-Python-Tutorial/Cloud_Setup/jupyter_env_setup.sh && ~/git_repos/ECCO-v4-Python-Tutorial/Cloud_Setup/jupyter_env_setup.sh
```

The script takes several minutes to run, but it should set up our environment with the packages we need. Now you can return to Step 4 of the [AWS Cloud: getting started](https://ecco-v4-python-tutorial.readthedocs.io/AWS_Cloud_getting_started.html) tutorial.
