Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

For complete SLURM documentation, see https://slurm.schedmd.com/. Here we only show simple examples with system-specific instructions.

HAL Slurm Wrapper Suite (Recommended)

Introduction

The HAL Slurm Wrapper Suite was designed to help users use the HAL system easily and efficiently. The current version is "swsuite-v0.1", which includes

srun → swrun : request resources to run interactive jobs.

sbatch → swbatch : request resource to submit a batch script to Slurm.

Usage

  • swrun -q <queue_name> -c <cpu_per_gpu> -t <walltime>
    • <queue_name> (required) : cpu, gpux1, gpux2, gpux3, gpux4, gpux8, gpux12, gpux16.
    • <cpu_per_gpu> (optional) : 12 cpus (default), range from 12 cpus to 36 cpus.
    • <walltime> (optional) : 24 hours (default), range from 1 hour to 72 hours.
    • example: swrun -q gpux4 -c 36 -t 72 (request a full node: 1x node, x4 node, 144x cpus, 72x hours)
  • swbatch <run_script>
    • <run_script> (required) : same as original slurm batch.
    • <job_name> (required) : job name.
    • <output_file> (required) : output file name.
    • <error_file> (required) : error file name.
    • <queue_name> (required) : cpu, gpux1, gpux2, gpux3, gpux4, gpux8, gpux12, gpux16.
    • <cpu_per_gpu> (optional) : 12 cpus (default), range from 12 cpus to 36 cpus.
    • <walltime> (optional) : 24 hours (default), range from 1 hour to 72 hours.
    • example: swbatch demo.sb


New Job Queues

Partition NamePriorityMax WalltimeNodes
Allowed
Min-Max CPUs
Per Node Allowed
Min-Max Mem
Per Node Allowed
GPU
Allowed
Local ScratchDescription
gpu-debughigh4 hrs112-14418-144 GB4none
gpux1normal72 hrs112-3618-54 GB1none
gpux2normal72 hrs124-7236-108 GB2none
gpux3normal72 hrs136-10854-162 GB3none
gpux4normal72 hrs148-14472-216 GB4none
cpunormal72 hrs196-96144-144 GB0none
gpux8low72 hrs248-14472-216 GB8none
gpux12low72 hrs348-14472-216 GB12none
gpux16low72 hrs448-14472-216 GB16none

Traditional Job Queues

Partition
Name
PriorityMax
Walltime
Min-Max
Nodes Allowed

Max CPUs
Per Node

Max Memory
Per CPU (GB)

Local Scratch
(GB)
Description
debughigh4 hrs1-11441.5Nonedesigned to access 1 node to run debug job
solonormal72 hrs1-11441.5Nonedesigned to access 1 node to run sequential and/or parallel job
ssdnormal72 hrs1-11441.5220similar to solo partition with extra local scratch, limited to hal[01-04]
batchlow72 hrs2-161441.5Nonedesigned to access 2-16 nodes (up to 64 GPUs) to run parallel job

HAL Example Job Scripts

...

New users should check the example job scripts at "/opt/apps/samples-runscript" and request adequate resources.

...