Slurm Batch Jobs
Most work on the cluster is executed as batch jobs. A batch job is a shell script that includes:
- Slurm parameters setting resource limits for your job. These parameters are set in lines beginning with
#SBATCH
at the top of the script. - Commands to load any necessary software modules and environments (e.g., Python virtualenv for Python)
- Execution of your code or application
Batch Job Types
Slurm batch jobs generally fall into three categories:
- Single Task Repeated with Different Inputs: This involves running a single task multiple times in parallel, often with varying inputs. This is typically done with Slurm job arrays, and is the simplest and most common method for statistical computing. See Example: Basic Job Array below for details.
- Multi-Core Application Execution: This involves allocating cluster resources to run an application that uses multiple CPU cores. In this scenario, the application itself must manage parallelization across the cluster, typically using a Slurm library or MPI (Message Passing Interface). See Example: MPI.
- GPU Jobs: GPUs resources are handled differently in Slurm than CPU core allocations. See Example: GPU.
Example: Basic Job Array
If you need to run single-process program many times in parallel, you can use an sbatch script to allocate resources for a single process, then use the --array
parameter to run your process multiple times in parallel.
Tip
Code often needs to be restructured before it can efficiently run on the cluster. A common scenario involves a program that uses a for
loop to run multiple iterations. To optimize for cluster execution, the code should be refactored to remove the for
loop, so it only handles a single iteration. A Slurm job array can then be used to run that iteration multiple times in parallel.
#!/bin/bash
#SBATCH --job-name=myjob # Set a name for your job
#SBATCH --partition=short # Slurm partition to use
#SBATCH --ntasks=1 # Number of tasks to run. By default, one CPU core will be allocated per task
#SBATCH --time=0-05:00 # Time limit in D-HH:MM
#SBATCH --mem-per-cpu=300 # Memory limit for each task (in MB)
#SBATCH --array=1-123 # The number of iterations for this job
#SBATCH -o myscript_%j.out # File to which STDOUT will be written
#SBATCH -e myscript_%j.err # File to which STDERR will be written
#SBATCH --mail-type=NONE # Type of email notification: NONE, BEGIN, END, FAIL, ALL
#SBATCH --mail-user=<yourmail>@uw.edu # Email to which notifications will be sent
module load R-bundle-CRAN
Rscript myscript.R inputfile$SLURM_ARRAY_TASK_ID
Example Details:
This example runs an R script 123 times in parallel, with each iteration reading an input file numbered according the job array (inputfile1, inputfile2, inputfile3 etc.). Each iteration can run for up to five hours, and consume up to 300MB of memory.
#SBATCH --partition=short
: This specificies that the job will use the short Partition.#SBATCH --ntasks=1
: Each iteration run by slurm uses only single task (i.e. CPU core, thread or process)#SBATCH --mem-per-cpu
: Set this a little higher than the maximum amount of RAM you expect a single job step will consume.#SBATCH --time
: Set this a little longer than the maximum run time you expect for a single job step.module load R-bundle-CRAN
: After defining the slurm#SBATCH
parameters, load packages for your job with Environment Modules.Rscript ...
: The final line runs your code- If you need to pass different parameters each time a job is run, you can use the
$SLURM_ARRAY_TASK_ID
environment variable to reference different files or inputs.
- If you need to pass different parameters each time a job is run, you can use the
Running the example:
- Run the job:
sbatch example.sbatch
- Verify that the job is in the queue:
squeue --me
- See completed or failed job steps:
sacct-detail
Example: MPI
This example still needs to be written. Contact help@stat.washington.edu if you need assistance with MPI.
Example: GPU
The gpu partition contains two GPU nodes. Each node has two NVIDIA Tesla V100 GPU's. To use these nodes, specify the number of GPU's required per task with –gres=gpu:<n>
. The value of "n" should be set to either 1 or 2.
For most GPU computing tasks, the CUDA module will need to be loaded before running any code. If you need R or Python libraries specific to GPU computing, these will need to be installed in an interactive session with the CUDA module loaded.
This example runs a single task using one GPU. To run multiple iterations of GPU task, add a line like #SBATCH -–array=1-<n tasks>
#!/bin/bash
#SBATCH --job-name=gpu_example
#SBATCH --partition=gpu
#SBATCH --time 110
#SBATCH --mem-per-cpu=800
#SBATCH --ntasks=1
#SBATCH --gres=gpu:1
#SBATCH --mail-type=NONE
#SBATCH --mail-user=myname@uw.edu
module load CUDA
./my_gpu_program
Example: Single Node Parallelization
If you are using a parallelization library or method that does not support network execution, you can limit your Slurm job to run a single node. Examples include forking background processes in bash with &
, GNU Parallel, multiprocessing in Python, and doParallel in R.
This has two major downsides:
- The job is limited to a fraction of the available resources on the cluster, since it is restricted to one of the 82 nodes available.
- Since the cores and memory required need to be within a single node, it may be stuck in the queue for a long time until an appropriate block of resources can be allocated.
Given these constraints, this method may still be worthwhile if you do not require significant compute resources, and you want to avoid rewriting code that uses a non-networked parallelization method. For any significant workloads, we recommend restructuring your code to run a single iteration, and use Slurm batch arrays to parallelize your workload (see the Example: Basic Job Array above for details).
Info
For a single node job, the number of tasks (--ntasks=
) cannot exceed the number of CPU cores available on the node. Currently, most nodes in the cluster have 48 or 52 cores each.
#!/bin/bash
#SBATCH --job-name=single_node # Set a name for your job
#SBATCH --partition=short # Slurm partition to use
#SBATCH --ntasks=24 # Number of tasks to run (limit 52, 24 or less recommended)
#SBATCH --nodes=1 # Limit this job to a single node
#SBATCH --time 0-05:00 # Time limit in D-HH:MM
#SBATCH --mem-per-cpu=100 # Memory limit for each tasks (in MB)
#SBATCH -o myscript_%j.out # File to which STDOUT will be written
#SBATCH -e myscript_%j.err # File to which STDERR will be written
#SBATCH --mail-type=NONE # Type of email notification- NONE,BEGIN,END,FAIL,ALL
#SBATCH --mail-user=<me>@uw.edu # Email to which notifications will be sent
module load R-bundle-CRAN
Rscript myscript.R