Useful Commands for HPC (NYU Shanghai, Pudong Campus)
The official documentation: https://github.com/michael-qi/nyush_hpc
Creating Virtual Environment
## load anaconda
$ module load anaconda3/5.2.0
## create virtual environment
$ conda create -n myEnv python=3.7 anaconda
## activate
$ source activate myEnv
## install packages
$ conda install -n myEnv [package]
#!/bin/bash
#SBATCH --time=120:00:00
#SBATCH --job-name=JobName
#SBATCH --output=slurm_%j.out
module purge
module load anaconda3/5.2.0
module load python/gnu/3.7.3
source activate myEnv
python myScript.py
Submit using:
$ sbatch runScript.sh
Deactivate and delete a no longer needed virtual environment:
$ source deactivate
$ conda remove -n myEnv -all
Array Jobs with Python
Running the same code with different parameters as inputs: Suppose the parameter space is 3-dimensional and each dimension has 2 possible values. We need to run 8 scripts in parallel.
Create the python file myScript.py:
import sys
import numpy as np
Task_ID = int(sys.argv[1])
# this is equivalent to Matlab's ind2sub
ind = np.unravel_index(Task_ID-1, [2, 2, 2], 'F')
ind_x = ind[0]
ind_y = ind[1]
ind_z = ind[2]
# some code to assign parameter values to different ind_x, ind_y, ind_z values
# also do the computation
# collect output
with open("result.txt", "a+") as text_file:
text_file.write("x: %s, y: %s, z: %s, result: %s \n" %(x, y, z, result))
In the bash file:
#!/bin/bash
#SBATCH --time=120:00:00
#SBATCH --job-name=JobName
#SBATCH --output=slurm_%j.out
#SBATCH --array=1-8
python myScript.py $SLURM_ARRAY_TASK_ID
Using Jupyter Lab
First create a virtual enviroment (with installed libraries). Then create the bash file jupyter.sh (request 1 GPU):
#!/bin/bash
#SBATCH --partition=aquila
#SBATCH --nodelist=agpu1
#SBATCH --gres=gpu:1
#SBATCH --job-name jupyter
#SBATCH --output jupyter-log.txt
#SBATCH --time=120:00:00
#SBATCH --mem=80GB
module purge
module load python/gnu/3.7.3
module load anaconda3/5.2.0
source activate myEnv
XDG_RUNTIME_DIR=""
ipnport=$(shuf -i8000-9999 -n1)
ipnip=$(hostname -i)
echo -e "\n"
echo " Paste ssh command in a terminal on local host (i.e., laptop)"
echo " ------------------------------------------------------------"
echo -e " ssh -N -L $ipnport:$ipnip:$ipnport $USER@hpc.shanghai.nyu.edu\n"
echo " Open this address in a browser on local host; see token below"
echo " ------------------------------------------------------------"
echo -e " localhost:$ipnport \n\n"
jupyter-lab --no-browser --port=$ipnport --ip=$ipnip
and submit using
$ sbatch jupyter.sh
$ cat jupyter-log.txt
Then open another ssh terminal and paste the line in jupyter-log.txt that is similar to:
$ ssh -N -L $ipnport:$ipnip:$ipnport $USER@hpc.shanghai.nyu.edu
Finally, open a browser with the following address:
localhost:$ipnport
Installing R packages
In the shell:
$ module load R/gnu/3.6.3
$ R
> install.packages("myPackage")
and answer "yes".