AWS Setup Instructions for Tufts EECS


Tufts EECS has generously sponsored AWS computer access for students enrolled in the BDL class.

We have provided instructions that shoudl work "as-is" for Unix/Linux/MacOS users. Windows users should be able to follow similar (but maybe not exactly the same) commands.

Jump to: Interactive Workflow - Jupyter Notebook Workflow - Specs - First-time Setup

For support requests, please contact your course staff.

Specifications: What machines are available? How many are there?

We'll have a small number (more than 1, less than 10 probably).

Each machine has the following specs:

  • nVidia K80 GPU card
  • 64 GB of memory
  • 4 Xeon CPUs.

All instances will be machines named aws-gpu-X, where X is an integer. For this tutorial, we will be using aws-gpu-1.

If you decide to use AWS for your project, please notify course staff and confirm which number you will use.

If you have problems with access please file an email ticket:

  • TO: staff(AT)eecs.tufts.edu
  • Subject: should include "registered student in COMP 150 - 03 - Bayesian Deep Learning"
  • Include your personal Tufts EECS account information
  • CC mhughes(AT)cs.tufts.edu.

Interactive Workflow

The primary way to use the AWS resources will be interactively. You'll login via SSH, and run jobs in your terminal.

Every time you wish to use the cluster this way, you should do these steps:

Step 1: Login via SSH

You'll need your Tufts EECS Username, which is DIFFERENT than your Tufts-wide username. We'll denote this as TUFTS_EECS_USERNAME. It usually doesn't end in numbers.

Execute the following SSH command at a UNIX-like terminal at your laptop or desktop:

Terminal of your laptop   IN:
ssh TUFTS_EECS_USERNAME@aws-gpu-1.eecs.tufts.edu
Expected OUT:
(prompt for your EECS password)

You'll be prompted to enter your password. Please enter it. This again is a different password than the one used for Tufts-wide HPC.

After successful password entry, you should see an active terminal session appear. If you look at your terminal prompt, you'll see something like 'aws-gpu-1{TUFTS_EECS_USERNAME}'. This means you've successfully ssh-ed into the AWS machine with a beefy GPU.

Step 2: Install anaconda and create new conda environment, referring to https://www.digitalocean.com/community/tutorials/how-to-install-the-anaconda-python-distribution-on-ubuntu-16-04

Download

curl -O https://repo.continuum.io/archive/Anaconda3-5.3.0-Linux-x86_64.sh

Unzip

sha256sum Anaconda3-5.3.0-Linux-x86_64.sh

Execute

bash Anaconda3-5.3.0-Linux-x86_64.sh

Entering bash environment, since eecs does not use bash shell

bash

Add path

source ~/.bashrc

Create new conda environment with tensorflow-gpu version 1.4.1)

conda create -n ENVNAME tensorflow-gpu=1.4.1

Step 3: Activate your conda environment

Always activate your environment as soon as you log in, so that you can be sure you're able to load all your required packages.

(FYI This won't work until you've completed the first-time setup instructions below)

Terminal of remote AWS instance   IN:
source activate bdl_pytorch_readonly
Expected OUT:
New terminal prompt prefix: (bdl_pytorch_readonly)

Step 4: Go run your code!

You can run your code in this interactive environment in the usual way.

For example, if you have a script that produces a .csv file of results you can do:

python my_script.py --output /h/TUFTS_EECS_USERNAME/my_results.csv

We imagine this writes a .csv file to your home directory on the AWS file system, which is located at /h/TUFTS_EECS_USERNAME/

Step 5: Download your results to your laptop

The key part of working on a remote machine is that you'll probably want to run long simulations on the remote, then download results to your laptop so you can make plots, etc.

For Unix/Linux/MacOS users, we recommend the scp tool, which works like this:

Terminal of your laptop   IN:
scp TUFTS_EECS_USERNAME@aws-gpu-1.eecs.tufts.edu:/h/TUFTS_EECS_USERNAME/*.csv /tmp/
Expected OUT:
my_results.csv 100% 0 0.0KB/s 00:00

That's all! You've now got a file called "my_results.csv" in your laptop's /tmp/ folder.

Jupyter Notebook Workflow

For most parts of your project, esp. big experiments, we strongly recommend running a pure Python script that doesn't require any GUI to run at the same time. However, sometimes a notebook is good for interactive debugging.

If you'd like to interact with a Jupyter Notebook that's running on the AWS machine, see the "Getting to your homework" at the bottom of this old page:

https://comp150dl.github.io/notes/tufts-aws-tutorial/

First Time Only: Setup Your Conda Environment

You can either use a provided environment, or install yourself (full control, but perhaps more headaches). These AWS instances are very clean machines, so you shouldn't have trouble following normal install instructions that use conda install or pip install.

Remember that to take full advantage of GPUs, you should install the versions of Tensorflow or PyTorch that can leverage GPUs. This can take time, but often the payoff is worth it.