For complete SLURM documentation, see https://slurm.schedmd.com/. Here we only show simple examples with system-specific instructions.
HAL Job Queues
Partition Name | Priority | Max Walltime | Min-Max Nodes Allowed | Max CPUs | Max Memory | Description |
---|---|---|---|---|---|---|
debug | high | 12 hrs | 1-1 | 144 | 1.5 | designed to access 1 node to run debug or short-term job |
solo | normal | 72 hrs | 1-1 | 144 | 1.5 | designed to access 1 node to run long-term job |
batch | low | 72 hrs | 2-16 | 144 | 1.5 | designed to access 2-16 nodes (up to 64 GPUs) to run parallel job |
HAL Example Job Scripts (Recommended)
New users should check the example job scripts at "/opt/apps/samples-runscript" and request adequate resources.
Script Name | Job Type | Partition | Max | Nodes | CPU | GPU | Memory (GB) | Description |
---|---|---|---|---|---|---|---|---|
run_debug_00gpu_144cpu_216GB.sh | interactive | debug | 12:00:00 | 1 | 144 | 0 | 216 | submit interactive job, 1 full node for 12 hours CPU only task in "debug" partition |
run_debug_01gpu_12cpu_18GB.sh | interactive | debug | 12:00:00 | 1 | 12 | 1 | 18 | submit interactive job, 25% of 1 full node for 12 hours GPU task in "debug" partition |
run_debug_02gpu_24cpu_36GB.sh | interactive | debug | 12:00:00 | 1 | 24 | 2 | 36 | submit interactive job, 50% of 1 full node for 12 hours GPU task in "debug" partition |
run_debug_04gpu_48cpu_72GB.sh | interactive | debug | 12:00:00 | 1 | 48 | 4 | 72 | submit interactive job, 1 full node for 12 hours GPU task in "debug" partition |
sub_solo_01node_01gpu_12cpu_18GB.sb | sbatch | solo | 72:00:00 | 1 | 12 | 1 | 18 | submit batch job, 25% of 1 full node for 72 hours GPU task in "solo" partition |
sub_solo_01node_02gpu_24cpu_36GB.sb | sbatch | solo | 72:00:00 | 1 | 24 | 2 | 36 | submit batch job, 50% of 1 full node for 72 hours GPU task in "solo" partition |
sub_solo_01node_04gpu_48cpu_72GB.sb | sbatch | solo | 72:00:00 | 1 | 48 | 4 | 72 | submit batch job, 1 full node for 72 hours GPU task in "solo" partition |
sub_batch_02node_08gpu_96cpu_144GB.sb | sbatch | batch | 72:00:00 | 2 | 96 | 8 | 144 | submit batch job, 2 full nodes for 72 hours GPU task in "batch" partition |
sub_batch_16node_64gpu_768cpu_1152GB.sb | sbatch | batch | 72:00:00 | 16 | 768 | 64 | 1152 | submit batch job, 16 full nodes for 72 hours GPU task in "batch" partition |
Native SLURM style
Submit Interactive Job with "srun"
srun --partition=debug --pty --nodes=1 \ --ntasks-per-node=12 --cores-per-socket=12 --mem-per-cpu=1500 --gres=gpu:v100:1 \ -t 01:30:00 --wait=0 \ --export=ALL /bin/bash
Submit Batch Job
sbatch [job_script]
Check Job Status
squeue # check all jobs from all users squeue -u [user_name] # check all jobs belong to user_name
Cancel Running Job
scancel [job_id] # cancel job with [job_id]
PBS style
Some PBS commands are supported by SLURM.
Check Node Status
pbsnodes
Check Job Status
qstat -f [job_number]
Check Queue Status
qstat
Delete Job
qdel [job_number]
Submit Batch Job
$ cat test.pbs #!/usr/bin/sh #PBS -N test #PBS -l nodes=1 #PBS -l walltime=10:00 hostname $ qsub test.pbs 107 $ cat test.pbs.o107 hal01.hal.ncsa.illinois.edu