Page History

...

For complete SLURM documentation, see https://slurm.schedmd.com/. Here we only show simple examples with system-specific instructions.

HAL Slurm Wrapper Suite (Recommended)

Introduction

The HAL Slurm Wrapper Suite was designed to help users use the HAL system easily and efficiently. The current version is "swsuite-v0.1", which includes

srun → swrun : request resources to run interactive jobs.

sbatch → swbatch : request resource to submit a batch script to Slurm.

Usage

swrun -q <queue_name> -c <cpu_per_gpu> -t <walltime>
- <queue_name> (required) : cpu, gpux1, gpux2, gpux3, gpux4, gpux8, gpux12, gpux16.
- <cpu_per_gpu> (optional) : 12 cpus (default), range from 12 cpus to 36 cpus.
- <walltime> (optional) : 24 hours (default), range from 1 hour to 72 hours.
- example: swrun -q gpux4 -c 36 -t 72 (request a full node: 1x node, x4 node, 144x cpus, 72x hours)
swbatch <run_script>
- <run_script> (required) : same as original slurm batch.
- <job_name> (required) : job name.
- <output_file> (required) : output file name.
- <error_file> (required) : error file name.
- <queue_name> (required) : cpu, gpux1, gpux2, gpux3, gpux4, gpux8, gpux12, gpux16.
- <cpu_per_gpu> (optional) : 12 cpus (default), range from 12 cpus to 36 cpus.
- <walltime> (optional) : 24 hours (default), range from 1 hour to 72 hours.
- example: swbatch demo.sb

New Job Queues

Partition Name	Priority	Max Walltime	Nodes Allowed	Min-Max CPUs Per Node Allowed	Min-Max Mem Per Node Allowed	GPU Allowed	Local Scratch
gpu-debug	high	4 hrs	1	12-144	18-144 GB	4	none
gpux1	normal	72 hrs	1	12-36	18-54 GB	1	none
gpux2	normal	72 hrs	1	24-72	36-108 GB	2	none
gpux3	normal	72 hrs	1	36-108	54-162 GB	3	none
gpux4	normal	72 hrs	1	48-144	72-216 GB	4	none
cpu	normal	72 hrs	1	96-96	144-144 GB	0	none
gpux8	low	72 hrs	2	48-144	72-216 GB	8	none
gpux12	low	72 hrs	3	48-144	72-216 GB	12	none
gpux16	low	72 hrs	4	48-144	72-216 GB	16	none

Traditional Job Queues

Partition Name	Priority	Max Walltime	Min-Max Nodes Allowed	Max CPUs Per Node	Max Memory Per CPU (GB)	Local Scratch (GB)	Description
debug	high	4 hrs	1-1	144	1.5	None	designed to access 1 node to run debug job
solo	normal	72 hrs	1-1	144	1.5	None	designed to access 1 node to run sequential and/or parallel job
ssd	normal	72 hrs	1-1	144	1.5	220	similar to solo partition with extra local scratch, limited to hal[01-04]
batch	low	72 hrs	2-16	144	1.5	None	designed to access 2-16 nodes (up to 64 GPUs) to run parallel job

HAL Example Job Scripts

...

Child pages

Versions Compared

Old Version 26

New Version 27

Key

HAL Slurm Wrapper Suite (Recommended)

Introduction

Usage

New Job Queues

Traditional Job Queues

HAL Example Job Scripts

New users should check the example job scripts at "/opt/apps/samples-runscript" and request adequate resources.
...

Child pages

Page History

Versions Compared

Old Version 26

New Version 27

Key

HAL Slurm Wrapper Suite (Recommended)

Introduction

Usage

New Job Queues

Traditional Job Queues

HAL Example Job Scripts

New users should check the example job scripts at "/opt/apps/samples-runscript" and request adequate resources....

New users should check the example job scripts at "/opt/apps/samples-runscript" and request adequate resources.
...