You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 25 Next »

For complete SLURM documentation, see https://slurm.schedmd.com/. Here we only show simple examples with system-specific instructions.

HAL Job Queues

Partition
Name
PriorityMax
Walltime
Min-Max
Nodes Allowed

Max CPUs
Per Node

Max Memory
Per CPU (GB)

Description
debughigh4 hrs1-11441.5designed to access 1 node to run debug job
solonormal72 hrs1-11441.5designed to access 1 node to run sequential and/or parallel job
batchlow72 hrs2-161441.5designed to access 2-16 nodes (up to 64 GPUs) to run parallel job

HAL Example Job Scripts (Recommended)

New users should check the example job scripts at "/opt/apps/samples-runscript" and request adequate resources.

Script
Name
Job
Type
Partition

Max
Walltime

Nodes

CPU

GPU

Memory
(GB)
Description
run_debug_00gpu_96cpu_216GB.shinteractivedebug4:00:001960144submit interactive job, 1 full node for 4 hours CPU only task in "debug" partition
run_debug_01gpu_12cpu_18GB.shinteractivedebug4:00:00112118submit interactive job, 25% of 1 full node for 4 hours GPU task in "debug" partition
run_debug_02gpu_24cpu_36GB.shinteractivedebug4:00:00124236submit interactive job, 50% of 1 full node for 4 hours GPU task in "debug" partition
run_debug_04gpu_48cpu_72GB.shinteractivedebug4:00:00148472submit interactive job, 1 full node for 4 hours GPU task in "debug" partition
sub_solo_01node_01gpu_12cpu_18GB.sbsbatchsolo72:00:00112118submit batch job, 25% of 1 full node for 72 hours GPU task in "solo" partition
sub_solo_01node_02gpu_24cpu_36GB.sbsbatchsolo72:00:00124236submit batch job, 50% of 1 full node for 72 hours GPU task in "solo" partition
sub_solo_01node_04gpu_48cpu_72GB.sbsbatchsolo72:00:00148472submit batch job, 1 full node for 72 hours GPU task in "solo" partition
sub_batch_02node_08gpu_96cpu_144GB.sbsbatchbatch72:00:002968144submit batch job, 2 full nodes for 72 hours GPU task in "batch" partition
sub_batch_16node_64gpu_768cpu_1152GB.sbsbatchbatch72:00:0016768641152submit batch job, 16 full nodes for 72 hours GPU task in "batch" partition

Native SLURM style

Submit Interactive Job with "srun"

srun --partition=debug --pty --nodes=1 \
--ntasks-per-node=12 --cores-per-socket=12 --mem-per-cpu=1500 --gres=gpu:v100:1 \
-t 01:30:00 --wait=0 \
--export=ALL /bin/bash

Submit Batch Job

sbatch [job_script]

Check Job Status

squeue                # check all jobs from all users 
squeue -u [user_name] # check all jobs belong to user_name

Cancel Running Job

scancel [job_id] # cancel job with [job_id]

PBS style

Some PBS commands are supported by SLURM.

Check Node Status

pbsnodes

Check Job Status

qstat -f [job_number]

Check Queue Status

qstat

Delete Job

qdel [job_number]

Submit Batch Job

$ cat test.pbs
#!/usr/bin/sh
#PBS -N test
#PBS -l nodes=1
#PBS -l walltime=10:00

hostname
$ qsub test.pbs
107
$ cat test.pbs.o107
hal01.hal.ncsa.illinois.edu
  • No labels