Slurm Commands

Slurm commands are run from the login node to manage jobs, start interactive sessions, and monitor the status of cluster resources.

sbatch: Submit a batch job script to the queue. See Batch Jobs for details.
srun: Run a single job step. This is typically used to run an interactive shell for testing code, or installing libraries on the build partition.
- Example: srun --pty --time=30 --mem-per-cpu=500 --partition=build /bin/bash
  - --pty: specifies that this srun is for interactive use
  - --time=30: sets a time limit of 30 minutes (see Job Limits)
  - --mem-per-cpu=500: sets a memory allocation of 500MB (see Job Limits)
  - --partition=build: set the partition for the job step
  - /bin/bash: the command to run within the job step. In this case, this is the path to the bash shell for interactive use.
- Once an srun interactive session is active, the prompt will change to include the partition name and cluster node name. In this example, build and cls-cmp-c1-1:
squeue: List all the running and queued slurm jobs. This can be limited to only your jobs by running squeue --me
scancel: Cancel a running or queued slurm job by id number. The first column outputted by squeue is the job id. If the job contains steps, then it is appended with an underscore and the step number.
- You can kill all of your running and queued jobs with scancel --me
sacct-detail: Show your recent job history, and the exit status of each job step. This is the primary tool troubleshooting failed jobs (see Slurm Troubleshooting)
sinfo: Show the status of all the slurm partitions. The states listed the output are:
- idle: Nodes with no running jobs
- alloc: Nodes with all CPU cores in use
- mix: Nodes with some CPU cores in use, some idle. This often means that the node is unavailable due to running jobs consuming all available memory on the node.
- drain/down: Nodes are not currently available due to maintenance or problems.