For complete SLURM documentation, see https://slurm.schedmd.com/. Here we only show simple examples with system-specific instructions.

HAL Slurm Wrapper Suite (Recommended)

Introduction

The HAL Slurm Wrapper Suite was designed to help users use the HAL system easily and efficiently. The current version is "swsuite-v0.4", which includes

srun (slurm command) → swrun : request resources to run interactive jobs.

sbatch (slurm command) → swbatch : request resource to submit a batch script to Slurm.

squeue (slurm command) → swqueue : check current running jobs and computational resource status.

Rule of Thumb

Usage

New Job Queues

Partition NamePriorityMax WalltimeNodes
Allowed
Min-Max CPUs
Per Node Allowed
Min-Max Mem
Per Node Allowed
GPU
Allowed
Local ScratchDescription
gpux1normal72 hrs116-4021.6-48 GB1nonedesigned to access 1 GPU on 1 node to run sequential and/or parallel job.
gpux2normal72 hrs132-8036-108 GB2nonedesigned to access 2 GPUs on 1 node to run sequential and/or parallel job.
gpux3normal72 hrs148-12054-162 GB3nonedesigned to access 3 GPUs on 1 node to run sequential and/or parallel job.
gpux4normal72 hrs164-16072-216 GB4nonedesigned to access 4 GPUs on 1 node to run sequential and/or parallel job.
gpux8normal72 hrs264-16072-216 GB8nonedesigned to access 8 GPUs on 2 nodes to run sequential and/or parallel job.
gpux12normal72 hrs364-16072-216 GB12nonedesigned to access 12 GPUs on 3 nodes to run sequential and/or parallel job.
gpux16normal72 hrs464-16072-216 GB16nonedesigned to access 16 GPUs on 4 nodes to run sequential and/or parallel job.
cpu_mininormal72 hrs18-8

none
cpun1normal72 hrs196-96144-144 GB0nonedesigned to access 96 CPUs on 1-16 node to run sequential and/or parallel job.
cpun2normal72 hrs296-96

none
cpun4normal72 hrs496-96

none
cpun8normal72 hrs896-96

none
cpun16normal72 hrs1696-96

none

HAL Wrapper Suite Example Job Scripts

New users should check the example job scripts at "/opt/samples/runscripts" and request adequate resources.

Script Name

Job Type

Partition

Walltime

NodesCPUGPU

Memory

Description
run_gpux1_12cpu_24hrs.shinteractivegpux124 hrs112118 GBsubmit interactive job, 1x node for 24 hours w/ 12x CPU 1x GPU task in "gpux1" partition.
run_gpux2_24cpu_24hrs.shinteractivegpux224 hrs124236 GBsubmit interactive job, 1x node for 24 hours w/ 24x CPU 2x GPU task in "gpux2" partition.
sub_gpux1_12cpu_24hrs.sbbatchgpux124 hrs112118 GBsubmit batch job, 1x node for 24 hours w/ 12x CPU 1x GPU task in "gpux1" partition.
sub_gpux2_24cpu_24hrs.sbbatchgpux224 hrs124236 GBsubmit batch job, 1x node for 24 hours w/ 24x CPU 2x GPU task in "gpux2" partition.
sub_gpux4_48cpu_24hrs.sbbatchgpux424 hrs148472 GBsubmit batch job, 1x node for 24 hours w/ 48x CPU 4x GPU task in "gpux4" partition.
sub_gpux8_96cpu_24hrs.sbbatchgpux824 hrs2968144 GBsubmit batch job, 2x node for 24 hours w/ 96x CPU 8x GPU task in "gpux8" partition.
sub_gpux16_192cpu_24hrs.sbbatchgpux1624 hrs419216288 GBsubmit batch job, 4x node for 24 hours w/ 192x CPU 16x GPU task in "gpux16" partition.

Native SLURM style

Submit Interactive Job with "srun"

srun --partition=debug --pty --nodes=1 \
     --ntasks-per-node=12 --cores-per-socket=3 \
     --threads-per-core=4 --sockets-per-node=1 \
     --mem-per-cpu=1500 --gres=gpu:v100:1 \
     --time 01:30:00 --wait=0 \
     --export=ALL /bin/bash

Submit Batch Job

sbatch [job_script]

Check Job Status

squeue                # check all jobs from all users 
squeue -u [user_name] # check all jobs belong to user_name

Cancel Running Job

scancel [job_id] # cancel job with [job_id]

PBS style

Some PBS commands are supported by SLURM.

Check Node Status

pbsnodes

Check Job Status

qstat -f [job_number]

Check Queue Status

qstat

Delete Job

qdel [job_number]

Submit Batch Job

$ cat test.pbs
#!/usr/bin/sh
#PBS -N test
#PBS -l nodes=1
#PBS -l walltime=10:00

hostname
$ qsub test.pbs
107
$ cat test.pbs.o107
hal01.hal.ncsa.illinois.edu