Bayesian Deep Learning

Tufts CS Special Topics Course | COMP 150 - 03 BDL | Fall 2018

Tufts HPC Setup Instructions


Tufts High Performance Computing center has generously offered free research accounts for all students enrolled in the BDL class.

We have provided instructions for Unix/Linux/MacOS users. Windows users should be able to follow similar (but not exactly the same) commands.

Jump to: Interactive Workflow - Batch Submission Workflow - First-time Setup

For support requests, please look at the available documentation from Tufts HPC: https://wikis.uit.tufts.edu/confluence/display/TuftsUITResearchComputing/High+Performance+Compute+Cluster

Or contact your course staff.

Interactive Workflow

The primary way to use the cluster will be interactively. You login to an HPC computer, run a single job via your terminal, watch it make progress, maybe kill it, adjust some code, and then run again. When it's done, you can safely log out.

Every time you wish to use the cluster this way, you should do these steps:

Step 1: Login via SSH

You'll need your Tufts Username , which usually looks something like 'amurphy02' or 'xli88'.

Execute the following SSH command at a UNIX-like terminal at your laptop or desktop:

Terminal of your laptop   IN:
ssh TUFTS_USERNAME@login.cluster.tufts.edu
Expected OUT:
(prompt for a password)

You'll be prompted to enter your Tufts password. Please enter it.

After successful password entry, you should see an active terminal session appear. If you look at your terminal prompt, you'll see something like 'TUFTS_USERNAME@login001'. This means you've successfully ssh-ed into a 'login' node on the cluster.

Login nodes are designed only to be an entrypoint to the cluster.

You should NOT do serious computing on the login nodes. Login nodes are only for cd-ing around your files, reading/writing text files, and asking the HPC system for more resources.

Think of the login node like the lobby of a building. You need to enter it to do your work, and you might pause for a few minutes at an empty chair to check your email, but you wouldn't claim a permanent desk and do 40 hours a week of intense work.

Instead, please launch an interactive session on an HPC computing node.

Step 2: Start interactive session

At your terminal, do the following:

Terminal of remote HPC   IN:
srun -t 0-02:00 --mem 2000 -p interactive --pty bash
Expected OUT:
New terminal prompt looks like: USERNAME@alpha001

This gives a 2-hour time limit interactive session with a bash terminal, with 2GB of memory. You can adjust the -t or the -mem flags as needed.

Step 3: Activate your conda environment

Always activate your environment as soon as you log in, so that you can be sure you're able to load all your required packages.

(FYI This won't work until you've completed the first-time setup instructions below)

Terminal of remote HPC   IN:
source activate bdl_pytorch_readonly
Expected OUT:
New terminal prompt prefix: (bdl_pytorch_readonly)

Step 4: Go run your code!

You can run your code in this interactive environment in the usual way.

For example, if you have a script that produces a .csv file of results you can do:

python my_script.py --output /cluster/home/TUFTS_USERNAME/my_results.csv

We imagine this writes a .csv file to your home directory on the cluster file system.

Step 5: Download your results to your laptop

The key part of working on a remote cluster is that you'll probably want to run long simulations on the remote cluster, then download results to your laptop so you can make plots, etc.

For Unix/Linux/MacOS users, we recommend the scp tool, which works like this:

Terminal of your laptop   IN:
scp TUFTS_USERNAME@login.cluster.tufts.edu:/cluster/home/TUFTS_USERNAME/*.csv /tmp/
Expected OUT:
my_results.csv 100% 0 0.0KB/s 00:00

That's all! You've now got a file called "my_results.csv" in your /tmp/ folder.

Batch Submission Workflow

If you wish to have many jobs running at once, you can use batch submission.

Your instructor has prepared a detailed example (based on HW4) here:

https://github.com/tufts-ml/comp150_bdl_2018f_public/blob/master/hpc_example/README.md

Please be a good citizen when using the grid. Do not request too many resources at once or use more than you really need.

First Time Only: Setup Your Conda Environment

You can either use a provided environment, or install yourself (full control, but perhaps more headaches). Course staff have found many install headaches on the Tufts HPC systems (the GLIBC is out-of-date, so neither PyTorch nor Tensorflow install easily). So we suggest you try out our provided environment.

Option 1: Use provided conda environment 'bdl_pytorch_readonly'

Provided "bdl_pytorch_readonly" conda environment features:

  • Python 2.7
  • Packages: PyTorch, Tensorflow, Numpy, Scipy, Sklearn
  • Viz Packages: matplotlib and seaborn
  • Avoid any headaches related to installing pytorch yourself
  • You CANNOT edit this install directly (it is "read only"), but, you can optionally extend by pip-installing packages in your $HOME directory
Setup Step 0

Be sure you are logged into Tufts HPC systems

Setup Step 1

Edit .bashrc to contain these lines:

# Inform bash shell where to find 'conda' and the 'activate' script
export PATH="/cluster/tufts/hugheslab/miniconda2/bin/:$PATH"

# Set default matplotlib backend so you can make plots via terminal
export MPLBACKEND='agg'

FYI: You will have read access to /cluster/tufts/hugheslab/miniconda2/, but no write access to any folder here. So you cannot make changes (this is by design, since all users need these packages).

Setup Step 1 Verification

Verify Step 1 above by interacting with the terminal:

  • Activate the environment
IN:
source activate bdl_pytorch_readonly
Expected OUT:
Nothing. Only a change to your prompt.
  • Verify correct version of Python is on path
IN:
which python
Expected OUT:
/cluster/tufts/hugheslab/miniconda2/envs/bdl_pytorch_readonly/bin/python
  • Verify you can access sklearn
IN:
python -c "import sklearn; print(sklearn.__version__)"
Expected OUT:
0.19.2
  • Verify you can access PyTorch
IN:
python -c "import torch; print(torch.__version__)"
Expected OUT:
0.4.1

That's it! You should be ready to go on to the next step.

Setup Step 2

Create a custom place to install any extra packages you need:

mkdir $HOME/my_bdl_pytorch_packages/

Please use exactly the directory "my_bdl_pytorch_packages".

Now, if you need to pip install anything, your package will be put there!

Note: conda install will NOT work. Only pip install works (for the moment).

Option 2: Install your own custom conda environment

Follow the instructions: <{filename}/setup_python_env.html>

Be ready for several complications:

  • Usually, a full install of bdl_pytorch_env requires over 4GB. The quota for Tufts HPC is around 5GB. So you'll have little space left over for storing data or results.

  • Tufts HPC has a very old version of GLIBC. So neither pip install pytorch nor conda install pytorch will work out of the box.