Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • swrun -q <queue_name> -c <cpu_per_gpu> -t <walltime>
    • <queue_name> (required) : cpu, gpux1, gpux2, gpux3, gpux4, gpux8, gpux12, gpux16.
    • <cpu_per_gpu> (optional) : 12 cpus (default), range from 12 cpus to 36 cpus.
    • <walltime> (optional) : 24 hours (default), range from 1 hour to 72 hours.
    • example: swrun -q gpux4 -c 36 -t 72 (request a full node: 1x node, x4 node, 144x cpus, 72x hours)
  • swbatch <run_script>
    • <run_script> (required) : same as original slurm batch.
    • <job_name> (required) : job name.
    • <output_file> (required) : output file name.
    • <error_file> (required) : error file name.
    • <queue_name> (required) : cpu, gpux1, gpux2, gpux3, gpux4, gpux8, gpux12, gpux16.
    • <cpu_per_gpu> (optional) : 12 cpus (default), range from 12 cpus to 36 cpus.
    • <walltime> (optional) : 24 hours (default), range from 1 hour to 72 hours.
    • example: swbatch demo.sb

New Job Queues

Partition NamePriorityMax WalltimeNodes
Allowed
Min-Max CPUs
Per Node Allowed
Min-Max Mem
Per Node Allowed
GPU
Allowed
Local ScratchDescription
gpu-debughigh4 hrs112-14418-144 GB4none
gpux1normal72 hrs112-3618-54 GB1none
gpux2normal72 hrs124-7236-108 GB2none
gpux3normal72 hrs136-10854-162 GB3none
gpux4normal72 hrs148-14472-216 GB4none
cpunormal72 hrs196-96144-144 GB0none
gpux8low72 hrs248-14472-216 GB8none
gpux12low72 hrs348-14472-216 GB12none
gpux16low72 hrs448-14472-216 GB16none

Native SLURM style

Submit Interactive Job with "srun"

...

Code Block
scancel [job_id] # cancel job with [job_id]

Job Queues

Partition
Name
PriorityMax
Walltime
Min-Max
Nodes Allowed

Max CPUs
Per Node

Max Memory
Per CPU (GB)

Local Scratch
(GB)
Description
debughigh4 hrs1-11441.5Nonedesigned to access 1 node to run debug job
solonormal72 hrs1-11441.5Nonedesigned to access 1 node to run sequential and/or parallel job
ssdnormal72 hrs1-11441.5220similar to solo partition with extra local scratch, limited to hal[01-04]
batchlow72 hrs2-161441.5Nonedesigned to access 2-16 nodes (up to 64 GPUs) to run parallel job

HAL Example Job Scripts

New users should check the example job scripts at "/opt/apps/samples-runscript" and request adequate resources.

Script
Name
Job
Type
Partition

Max
Walltime

Nodes

CPU

GPU

Memory
(GB)
Description
run_debug_00gpu_96cpu_216GB.shinteractivedebug4:00:001960144submit interactive job, 1 full node for 4 hours CPU only task in "debug" partition
run_debug_01gpu_12cpu_18GB.shinteractivedebug4:00:00112118submit interactive job, 25% of 1 full node for 4 hours GPU task in "debug" partition
run_debug_02gpu_24cpu_36GB.shinteractivedebug4:00:00124236submit interactive job, 50% of 1 full node for 4 hours GPU task in "debug" partition
run_debug_04gpu_48cpu_72GB.shinteractivedebug4:00:00148472submit interactive job, 1 full node for 4 hours GPU task in "debug" partition
sub_solo_01node_01gpu_12cpu_18GB.sbsbatchsolo72:00:00112118submit batch job, 25% of 1 full node for 72 hours GPU task in "solo" partition
sub_solo_01node_02gpu_24cpu_36GB.sbsbatchsolo72:00:00124236submit batch job, 50% of 1 full node for 72 hours GPU task in "solo" partition
sub_solo_01node_04gpu_48cpu_72GB.sbsbatchsolo72:00:00148472submit batch job, 1 full node for 72 hours GPU task in "solo" partition
sub_ssd_01node_01gpu_12cpu_18GB.sbsbatchssd72:00:00112118submit batch job, 25% of 1 full node for 72 hours GPU task in "ssd" partition
sub_batch_02node_08gpu_96cpu_144GB.sbsbatchbatch72:00:002968144submit batch job, 2 full nodes for 72 hours GPU task in "batch" partition
sub_batch_16node_64gpu_768cpu_1152GB.sbsbatchbatch72:00:0016768641152

submit batch job, 16 full nodes for 72 hours GPU task in "batch" partition

PBS style

Some PBS commands are supported by SLURM.

...