Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The HAL Slurm Wrapper Suite was designed to help users use the HAL system easily and efficiently. The current version is "swsuite-v0.23", which includes

srun (slurm command) → swrun : request resources to run interactive jobs.

sbatch (slurm command) → swbatch : request resource to submit a batch script to Slurm.

squeue (slurm command) → swqueue : display resource from all computing nodes.

Rule of Thumb

  • Minimize the required input options.
  • Consistent with the original "slurm" run-script format.
  • Submits job to suitable partition based on the number of GPUs needed (number of nodes for CPU partition).

Usage

  • swrun -p <partition_name> -c <cpu_per_gpu> -t <walltime> -r <reservation_name>
    • <partition_name> (required) : cpucpun1, cpun2, cpun4, cpun8, gpux1, gpux2, gpux3, gpux4, gpux8, gpux12, gpux16.
    • <cpu_per_gpu> (optional) : 12 cpus (default), range from 12 cpus to 36 cpus.
    • <walltime> (optional) : 24 hours (default), range from 1 hour to 72 hours.
    • <reservation_name> (optional) : reservation name granted to user.
    • example: swrun -p gpux4 -c 36 -t 72 (request a full node: 1x node, x4 node, 144x cpus, 72x hours)
  • swbatch <run_script>
    • <run_script> (required) : same as original slurm batch.
    • <job_name> (optional) : job name.
    • <output_file> (optional) : output file name.
    • <error_file> (optional) : error file name.
    • <partition_name> (required) : cpu, gpux1, gpux2, gpux3, gpux4, gpux8, gpux12, gpux16.
    • <cpu_per_gpu> (optional) : 12 cpus (default), range from 12 cpus to 36 cpus.
    • <walltime> (optional) : 24 hours (default), range from 1 hour to 72 hours.
    • <reservation_name> (optional) : reservation name granted to user.
    • example: swbatch demo.sbswb

      Code Block
      languagebash
      titledemo.sbswb
      #!/bin/bash
      
      #SBATCH --job-name="demo"
      #SBATCH --output="demo.%j.%N.out"
      #SBATCH --error="demo.%j.%N.err"
      #SBATCH --partition=gpux1
      
      srun hostname


  • swqueue
    • example: swqueue

New Job Queues

16 GPUs on 4 nodes
Partition NamePriorityMax WalltimeNodes
Allowed
Min-Max CPUs
Per Node Allowed
Min-Max Mem
Per Node Allowed
GPU
Allowed
Local ScratchDescription
gpu-debughigh4 hrs112-14418-144 GB4nonedesigned to access 1 node to run debug job.
gpux1normal72 hrs112-3618-54 GB1nonedesigned to access 1 GPU on 1 node to run sequential and/or parallel job.
gpux2normal72 hrs124-7236-108 GB2nonedesigned to access 2 GPUs on 1 node to run sequential and/or parallel job.
gpux3normal72 hrs136-10854-162 GB3nonedesigned to access 3 GPUs on 1 node to run sequential and/or parallel job.
gpux4normal72 hrs148-14472-216 GB4nonedesigned to access 4 GPUs on 1 node to run sequential and/or parallel job.
cpugpux8normallow72 hrs129648-9614414472-144 216 GB08nonedesigned to access 96 CPUs on 1 node 8 GPUs on 2 nodes to run sequential and/or parallel job.
gpux8gpux12low72 hrs2348-14472-216 GB812nonedesigned to access 8 12 GPUs on 2 3 nodes to run sequential and/or parallel job.
gpux12gpux16low72 hrs3448-14472-216 GB1216nonedesigned to access 12 16 GPUs on 3 4 nodes to run sequential and/or parallel job.
gpux16cpulownormal72 hrs41-164896-9614472-216 144 GB160nonedesigned to access 96 CPUs on 1-16 node to run sequential and/or parallel job.

HAL Wrapper Suite Example Job Scripts

...