Job management with SLURM

HAL Job Queues

Partition Name	Priority	Max Walltime	Max Nodes/Job	Description
debug	high	12 hrs	1	designed to access 1 GPU to run debug or short term job
solo	normal	72 hrs	1	designed to access 1 GPU to run long term job
batch	normal	72 hrs	16	designed to access up to 64 GPUs to run parallel job

Native SLURM style

Submit Interactive Job with "srun"

srun --partition=debug --pty --nodes=1 \
--ntasks-per-node=8 --gres=gpu:v100:1 \
-t 01:30:00 --wait=0 \
--export=ALL /bin/bash

Submit Batch Job

sbatch [job_script]

Check Job Status

squeue -u [username]

Cancel Running Job

scancel -u [job_id]

PBS style

Some PBS commands are supported by SLURM.

Check Node Status

pbsnodes

Check Job Status

qstat -f [job_number]

Check Queue Status

qstat

Delete Job

qdel [job_number]

Submit Batch Job

$ cat test.pbs
#!/usr/bin/sh
#PBS -N test
#PBS -l nodes=1
#PBS -l walltime=10:00

hostname
$ qsub test.pbs
107
$ cat test.pbs.o107
hal01.hal.ncsa.illinois.edu

Child pages

Job management with SLURM

HAL Job Queues

Native SLURM style

Submit Interactive Job with "srun"

Submit Batch Job

Check Job Status

Cancel Running Job

PBS style

Check Node Status

Check Job Status

Check Queue Status

Delete Job

Submit Batch Job