Tufts HPC Setup Instructions


Tufts High Performance Computing center has generously offered free research accounts for all students enrolled in the BDL class.

We have provided instructions for Unix/Linux/MacOS users. Windows users should be able to follow similar (but not exactly the same) commands.

Jump to: Interactive Workflow - Batch Submission Workflow - First-time Setup

For any issues, please first contact course staff directly via Piazza. Be sure to provide detailed printouts from your terminal of EXACTLY what you tried and what the error was.

If you have errors related to permissions to access course-related directories, etc, please only contact your instructor and TA. We are the only people that maintain the software environments for this course, TSS is not responsible for any software issues (related to python, conda, etc.).

TTS can address issues related to these services:

  • logging in via VPN
  • dealing with disk space quota issues
  • questions about hardware available on the TTS and how to request it
  • questions about SLURM (the software used to submit many jobs at once to the cluster)

For help on these issues (and not software ones) please contact tts-research@tufts.edu and CC your instructor (mhughes@cs.tufts.edu).

For general information, please look at the available documentation from Tufts HPC: https://wikis.uit.tufts.edu/confluence/display/TuftsUITResearchComputing/High+Performance+Compute+Cluster

Interactive Workflow

The primary way to use the cluster will be interactively. You login to an HPC computer, run a single job via your terminal, watch it make progress, maybe kill it, adjust some code, and then run again. When it's done, you can safely log out.

Every time you wish to use the cluster this way, you should do these steps:

Step 0: If Off-Campus, Establish a VPN connection

If you are off-campus, you'll need to be first connected to the Tufts VPN. For instructions, see https://access.tufts.edu/vpn

Step 1: Login via SSH

You'll need your Tufts Username , which usually looks something like 'amurphy02' or 'xli88'.

Execute the following SSH command at a UNIX-like terminal at your laptop or desktop:

Terminal of your laptop   IN:
ssh TUFTS_USERNAME@login.cluster.tufts.edu
Expected OUT:
(prompt for a password)

You'll be prompted to enter your Tufts password. Please enter it.

After successful password entry, you should see an active terminal session appear. If you look at your terminal prompt, you'll see something like 'TUFTS_USERNAME@login001'. This means you've successfully ssh-ed into a 'login' node on the cluster.

Login nodes are designed only to be an entrypoint to the cluster.

You should NOT do serious computing on the login nodes. Login nodes are only for cd-ing around your files, reading/writing text files, and asking the HPC system for more resources.

Think of the login node like the lobby of a building. You need to enter it to do your work, and you might pause for a few minutes at an empty chair to check your email, but you wouldn't claim a permanent desk and do 40 hours a week of intense work.

Instead, please launch an interactive session on an HPC computing node.

Step 2: Start interactive session

For an interactive CPU only session

At your terminal, do the following:

Terminal of remote HPC   IN:
srun -t 0-02:00 --mem 2000 -p interactive --pty bash
Expected OUT:
New terminal prompt looks like: USERNAME@alpha001

This gives a 2-hour time limit interactive session with a bash terminal, with 2GB of memory. You can adjust the -t or the -mem requests as needed.

For an interactive GPU session

At your terminal, do the following:

Terminal of remote HPC   IN:
srun -t 0-02:00 --mem 2000 -p gpu --pty bash
Expected OUT:
New terminal prompt looks like: USERNAME@pgpuXX

Just like above, you can adjust the time limit (-t) or the memory (-mem) requests as needed.

Step 3: Activate your conda environment

Always activate your environment as soon as you log in, so that you can be sure you're able to load all your required packages.

(FYI This won't work until you've completed the first-time setup instructions below)

Terminal of remote HPC   IN:
conda activate bdl2019f_readonly
Expected OUT:
New terminal prompt prefix: (bdl2019f_readonly)

Step 4: Go run your code!

You can run your code in this interactive environment in the usual way.

For example, if you have a script that produces a .csv file of results you can do:

python my_script.py --output /cluster/home/TUFTS_USERNAME/my_results.csv

We imagine this writes a .csv file to your home directory on the cluster file system.

Step 5: Use the BDL class-specific storage

You can feel free to store relevant files in our dedicated storage space: /cluster/tufts/class/comp/150bdl

This storage can hold about 20 GB per student. Note that your home directory can store only ~5GB before you hit quota.

Step 6: Download your results to your laptop

The key part of working on a remote cluster is that you'll probably want to run long simulations on the remote cluster, then download results to your laptop so you can make plots, etc.

For Unix/Linux/MacOS users, we recommend the scp tool, which works like this:

Terminal of your laptop   IN:
scp TUFTS_USERNAME@login.cluster.tufts.edu:/cluster/home/TUFTS_USERNAME/*.csv /tmp/
Expected OUT:
my_results.csv 100% 0 0.0KB/s 00:00

That's all! You've now got a file called "my_results.csv" in your /tmp/ folder.

Batch Submission Workflow

If you wish to have many jobs running at once, you can use batch submission.

Your instructor has prepared a detailed example (based on HW4) here:

https://github.com/tufts-ml-courses/comp150-bdl-19f-assignments//blob/master/hpc_example/README.md

Please be a good citizen when using the grid. Do not request too many resources at once or use more than you really need.

First Time Only: Setup Your Conda Environment

You can either use a provided environment, or install yourself (full control, but perhaps more headaches). Course staff have found many install headaches on the Tufts HPC systems (the GLIBC is out-of-date, so neither PyTorch nor Tensorflow install easily). So we suggest you try out our provided environment.

Option 1: Use provided conda environment 'bdl2019f_readonly'

Provided "bdl2019f_readonly" conda environment features:

  • Python 3.6
  • Packages:
    • PyTorch 1.3.1 (with GPU support)
    • Tensorflow 1.14.0 (with GPU support)
    • Numpy 1.17.2
    • Scipy 1.3.1
    • Sklearn 0.21.3
    • Autograd
  • Viz Packages

    • pandas 0.25.1
    • matplotlib 3.1.1
    • seaborn

Keep in mind, you CANNOT edit this install directly (it is "read only"), but, you can optionally extend by pip-installing packages in your $HOME directory

Setup Step 0

Be sure you are logged into Tufts HPC systems

Setup Step 1

Edit .bashrc to contain these lines:

# Inform bash shell where to find 'conda' and the 'activate' script
export PATH="/cluster/tufts/hugheslab/miniconda2/bin/:$PATH"

# Set default matplotlib backend so you can make plots via terminal
export MPLBACKEND='agg'

After this, you should LOG OFF and then immediately LOG IN again.

This will let your changes to the .bashrc file take effect.

FYI: You will have read access to /cluster/tufts/hugheslab/miniconda2/, but no write access to any folder here. So you cannot make changes (this is by design, since all users need these packages, we don't want one student to nuke the abilities of another student).

Setup Step 2

  • Configure conda to work with your bash
IN:
conda init bash
Expected OUT:
Some helpful messages.

After this, you should LOG OFF and then immediately LOG IN again.

This will let your changes to .bashrc take effect.

Setup Step 1 Verification

Verify Step 1 above by interacting with the terminal:

  • Activate the environment
IN:
conda activate bdl2019f_readonly
Expected OUT:
Nothing. Only a change to your prompt.
  • Verify correct version of Python is on path
IN:
which python
Expected OUT:
/cluster/tufts/hugheslab/miniconda2/envs/bdl2019f_readonly/bin/python
  • Verify you can access sklearn
IN:
python -c "import sklearn; print(sklearn.__version__)"
Expected OUT:
0.21.3
  • Verify you can access Tensorflow
IN:
python -c "import tensorflow; print(tensorflow.__version__)"
Expected OUT:
1.14.0
  • Verify you can access PyTorch
IN:
python -c "import torch; print(torch.__version__)"
Expected OUT:
1.3.1

That's it! You should be ready to go on to the next step.

Setup Step 3

Create a custom place to install any extra packages you need:

mkdir $HOME/my_bdl2019f_packages/

Please use exactly the directory "my_bdl2019f_packages".

Now, if you need to pip install anything, your package will be put there!

Note: conda install will NOT work. Only pip install works (for the moment), because it lets you install packages in several directories simultaneously.

Option 2: Install your own custom conda environment

Follow the [Python Setup Instructions]

Be ready for several complications:

  • Usually, a full install of bdl_2019f_env requires over 4GB. The quota for Tufts HPC in your home directory is around 5GB. So you'll have little space left over for storing data or results.

  • Tufts HPC has a very old version of GLIBC. So neither pip install pytorch nor conda install pytorch will work out of the box. To fix GLIBC issues, you might follow these instructions: https://gist.github.com/michaelchughes/85287f1c6f6440c060c3d86b4e7d764b. Be aware these are NOT easy for novices.