Page History

...

The HAL Slurm Wrapper Suite was designed to help users use the HAL system easily and efficiently. The current version is "swsuite-v0.23", which includes

srun (slurm command) → swrun : request resources to run interactive jobs.

sbatch (slurm command) → swbatch : request resource to submit a batch script to Slurm.

squeue (slurm command) → swqueue : display resource from all computing nodes.

Rule of Thumb

Minimize the required input options.
Consistent with the original "slurm" run-script format.
Submits job to suitable partition based on the number of GPUs needed (number of nodes for CPU partition).

Usage

swrun -p <partition_name> -c <cpu_per_gpu> -t <walltime> -r <reservation_name>
- <partition_name> (required) : cpucpun1, cpun2, cpun4, cpun8, gpux1, gpux2, gpux3, gpux4, gpux8, gpux12, gpux16.
- <cpu_per_gpu> (optional) : 12 cpus (default), range from 12 cpus to 36 cpus.
- <walltime> (optional) : 24 hours (default), range from 1 hour to 72 hours.
- <reservation_name> (optional) : reservation name granted to user.
- example: swrun -p gpux4 -c 36 -t 72 (request a full node: 1x node, x4 node, 144x cpus, 72x hours)
swbatch <run_script>
- <run_script> (required) : same as original slurm batch.
- <job_name> (optional) : job name.
- <output_file> (optional) : output file name.
- <error_file> (optional) : error file name.
- <partition_name> (required) : cpu, gpux1, gpux2, gpux3, gpux4, gpux8, gpux12, gpux16.
- <cpu_per_gpu> (optional) : 12 cpus (default), range from 12 cpus to 36 cpus.
- <walltime> (optional) : 24 hours (default), range from 1 hour to 72 hours.
- <reservation_name> (optional) : reservation name granted to user.
- example: swbatch demo.sbswb
  Code Block
  language bash
  title demo.sbswb
  #!/bin/bash #SBATCH --job-name="demo" #SBATCH --output="demo.%j.%N.out" #SBATCH --error="demo.%j.%N.err" #SBATCH --partition=gpux1 srun hostname
swqueue
- example: swqueue

New Job Queues

16 GPUs on 4 nodes

Partition Name	Priority	Max Walltime	Nodes Allowed	Min-Max CPUs Per Node Allowed	Min-Max Mem Per Node Allowed	GPU Allowed	Local Scratch	Description
gpu-debug	high	4 hrs	1	12-144	18-144 GB	4	none	designed to access 1 node to run debug job.
gpux1	normal	72 hrs	1	12-36	18-54 GB	1	none	designed to access 1 GPU on 1 node to run sequential and/or parallel job.
gpux2	normal	72 hrs	1	24-72	36-108 GB	2	none	designed to access 2 GPUs on 1 node to run sequential and/or parallel job.
gpux3	normal	72 hrs	1	36-108	54-162 GB	3	none	designed to access 3 GPUs on 1 node to run sequential and/or parallel job.
gpux4	normal	72 hrs	1	48-144	72-216 GB	4	none	designed to access 4 GPUs on 1 node to run sequential and/or parallel job.
cpugpux8	normallow	72 hrs	12	9648-96144	14472-144 216 GB	08	none	designed to access 96 CPUs on 1 node 8 GPUs on 2 nodes to run sequential and/or parallel job.
gpux8gpux12	low	72 hrs	23	48-144	72-216 GB	812	none	designed to access 8 12 GPUs on 2 3 nodes to run sequential and/or parallel job.
gpux12gpux16	low	72 hrs	34	48-144	72-216 GB	1216	none	designed to access 12 16 GPUs on 3 4 nodes to run sequential and/or parallel job.
gpux16cpu	lownormal	72 hrs4	1-16	4896-96	14472-216 144 GB	160	none	designed to access	96 CPUs on 1-16 node to run sequential and/or parallel job.

HAL Wrapper Suite Example Job Scripts

...

Child pages

Versions Compared

Old Version 41

New Version 42

Key

Rule of Thumb

Usage

New Job Queues

HAL Wrapper Suite Example Job Scripts