Slurm Batch Jobs

Most work on the cluster is executed as batch jobs. A batch job is a shell script that includes:

Slurm parameters setting resource limits for your job. These parameters are set in lines beginning with #SBATCH at the top of the script.
Commands to load any necessary software modules and environments (e.g., Python virtualenv for Python)
Execution of your code or application

Batch Job Types

Slurm batch jobs generally fall into three categories:

Single Task Repeated with Different Inputs: This involves running a single task multiple times in parallel, often with varying inputs. This is typically done with Slurm job arrays, and is the simplest and most common method for statistical computing. See Example: Basic Job Array below for details.
Multi-Core Application Execution: This involves allocating cluster resources to run an application that uses multiple CPU cores. In this scenario, the application itself must manage parallelization across the cluster, typically using a Slurm library or MPI (Message Passing Interface). See Example: MPI.
GPU Jobs: GPUs resources are handled differently in Slurm than CPU core allocations. See Example: GPU.

Example: Basic Job Array

If you need to run single-process program many times in parallel, you can use an sbatch script to allocate resources for a single process, then use the --array parameter to run your process multiple times in parallel.

Tip

Code often needs to be restructured before it can efficiently run on the cluster. A common scenario involves a program that uses a for loop to run multiple iterations. To optimize for cluster execution, the code should be refactored to remove the for loop, so it only handles a single iteration. A Slurm job array can then be used to run that iteration multiple times in parallel.

array_example.sbatch

#!/bin/bash
#SBATCH --job-name=myjob               # Set a name for your job
#SBATCH --partition=short              # Slurm partition to use
#SBATCH --ntasks=1                     # Number of tasks to run. By default, one CPU core will be allocated per task
#SBATCH --time=0-05:00                 # Time limit in D-HH:MM
#SBATCH --mem-per-cpu=300              # Memory limit for each task (in MB)
#SBATCH --array=1-123                  # The number of iterations for this job
#SBATCH -o myscript_%j.out             # File to which STDOUT will be written
#SBATCH -e myscript_%j.err             # File to which STDERR will be written
#SBATCH --mail-type=NONE               # Type of email notification: NONE, BEGIN, END, FAIL, ALL
#SBATCH --mail-user=<yourmail>@uw.edu  # Email to which notifications will be sent

module load R-bundle-CRAN
Rscript myscript.R inputfile$SLURM_ARRAY_TASK_ID

Example Details:

This example runs an R script 123 times in parallel, with each iteration reading an input file numbered according the job array (inputfile1, inputfile2, inputfile3 etc.). Each iteration can run for up to five hours, and consume up to 300MB of memory.

#SBATCH --partition=short: This specificies that the job will use the short Partition.
#SBATCH --ntasks=1: Each iteration run by slurm uses only single task (i.e. CPU core, thread or process)
#SBATCH --mem-per-cpu: Set this a little higher than the maximum amount of RAM you expect a single job step will consume.
#SBATCH --time: Set this a little longer than the maximum run time you expect for a single job step.
module load R-bundle-CRAN: After defining the slurm #SBATCH parameters, load packages for your job with Environment Modules.
Rscript ...: The final line runs your code
- If you need to pass different parameters each time a job is run, you can use the $SLURM_ARRAY_TASK_ID environment variable to reference different files or inputs.

Running the example:

Run the job: sbatch example.sbatch
Verify that the job is in the queue: squeue --me
See completed or failed job steps: sacct-detail

Example: MPI

This example still needs to be written. Contact help@stat.washington.edu if you need assistance with MPI.

Example: GPU

The gpu partition contains two GPU nodes. Each node has two NVIDIA Tesla V100 GPU's. To use these nodes, specify the number of GPU's required per task with –gres=gpu:<n>. The value of "n" should be set to either 1 or 2.

For most GPU computing tasks, the CUDA module will need to be loaded before running any code. If you need R or Python libraries specific to GPU computing, these will need to be installed in an interactive session with the CUDA module loaded.

This example runs a single task using one GPU. To run multiple iterations of GPU task, add a line like #SBATCH -–array=1-<n tasks>

gpu_example.sbatch

#!/bin/bash
#SBATCH --job-name=gpu_example
#SBATCH --partition=gpu
#SBATCH --time 110
#SBATCH --mem-per-cpu=800
#SBATCH --ntasks=1
#SBATCH --gres=gpu:1
#SBATCH --mail-type=NONE
#SBATCH --mail-user=myname@uw.edu

module load CUDA
./my_gpu_program

Example: Single Node Parallelization

If you are using a parallelization library or method that does not support network execution, you can limit your Slurm job to run a single node. Examples include forking background processes in bash with &, GNU Parallel, multiprocessing in Python, and doParallel in R.

This has two major downsides:

The job is limited to a fraction of the available resources on the cluster, since it is restricted to one of the 82 nodes available.
Since the cores and memory required need to be within a single node, it may be stuck in the queue for a long time until an appropriate block of resources can be allocated.

Given these constraints, this method may still be worthwhile if you do not require significant compute resources, and you want to avoid rewriting code that uses a non-networked parallelization method. For any significant workloads, we recommend restructuring your code to run a single iteration, and use Slurm batch arrays to parallelize your workload (see the Example: Basic Job Array above for details).

Info

For a single node job, the number of tasks (--ntasks=) cannot exceed the number of CPU cores available on the node. Currently, most nodes in the cluster have 48 or 52 cores each.

single_node_example.sbatch

#!/bin/bash
#SBATCH --job-name=single_node  # Set a name for your job
#SBATCH --partition=short       # Slurm partition to use
#SBATCH --ntasks=24             # Number of tasks to run (limit 52, 24 or less recommended)
#SBATCH --nodes=1               # Limit this job to a single node
#SBATCH --time 0-05:00          # Time limit in D-HH:MM
#SBATCH --mem-per-cpu=100       # Memory limit for each tasks (in MB)
#SBATCH -o myscript_%j.out      # File to which STDOUT will be written
#SBATCH -e myscript_%j.err      # File to which STDERR will be written
#SBATCH --mail-type=NONE        # Type of email notification- NONE,BEGIN,END,FAIL,ALL
#SBATCH --mail-user=<me>@uw.edu # Email to which notifications will be sent

module load R-bundle-CRAN
Rscript myscript.R