You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 20 Next »

For complete SLURM documentation, see https://slurm.schedmd.com/. Here we only show simple examples with system-specific instructions.

HAL Job Queues

Partition
Name
PriorityMax
Walltime
Min-Max
Nodes Allowed

Max CPUs
Per Node

Max Memory
Per CPU (GB)

Description
debughigh12 hrs1-11441.5designed to access 1 GPU to run debug or short-term job
solonormal72 hrs1-11441.5designed to access 1 GPU to run long-term job
batchnormal72 hrs2-161441.5designed to access 2-16 nodes (up to 64 GPUs) to run parallel job

HAL Example Job Scripts (Recommended)

New users should check the example job scripts at "/opt/apps/samples-runscript" and request adequate resources.

Script
Name
Job
Type
Partition

Max
Walltime

Nodes

CPU

GPU

Memory
(GB)
Description
run_debug_00gpu_036cpu_0216mem.shinteractivedebug12:00:001360216submit interactive job, 1 full node for 12 hours CPU only task in "debug" partition
run_debug_01gpu_008cpu_0048mem.shinteractivedebug12:00:0018148submit interactive job, 25% of 1 full node for 12 hours GPU task in "debug" partition
run_debug_02gpu_016cpu_0096mem.shinteractivedebug12:00:00116296submit interactive job, 50% of 1 full node for 12 hours GPU task in "debug" partition
run_debug_04gpu_032cpu_0192mem.shinteractivedebug12:00:001324192submit interactive job, 1 full node for 12 hours GPU task in "debug" partition
sub_solo_01node_01gpu_08cpu_0048mem.sbsbatchsolo72:00:0018148submit batch job, 25% of 1 full node for 72 hours GPU task in "solo" partition
sub_solo_01node_02gpu_16cpu_0096mem.sbsbatchsolo72:00:001324192submit batch job, 50% of 1 full node for 72 hours GPU task in "solo" partition
sub_solo_01node_04gpu_32cpu_0192mem.sbsbatchsolo72:00:001324192submit batch job, 1 full node for 72 hours GPU task in "solo" partition
sub_batch_02node_08gpu_064cpu_0384mem.sbsbatchbatch72:00:002648384submit batch job, 2 full nodes for 72 hours GPU task in "batch" partition
sub_batch_16node_64gpu_512cpu_3072mem.sbsbatchbatch72:00:0016512643072submit batch job, 16 full nodes for 72 hours GPU task in "batch" partition

Native SLURM style

Submit Interactive Job with "srun"

srun --partition=debug --pty --nodes=1 \
--ntasks-per-node=8 --gres=gpu:v100:1 \
-t 01:30:00 --wait=0 \
--export=ALL /bin/bash

Submit Batch Job

sbatch [job_script]

Check Job Status

squeue                # check all jobs from all users 
squeue -u [user_name] # check all jobs belong to user_name

Cancel Running Job

scancel [job_id] # cancel job with [job_id]

PBS style

Some PBS commands are supported by SLURM.

Check Node Status

pbsnodes

Check Job Status

qstat -f [job_number]

Check Queue Status

qstat

Delete Job

qdel [job_number]

Submit Batch Job

$ cat test.pbs
#!/usr/bin/sh
#PBS -N test
#PBS -l nodes=1
#PBS -l walltime=10:00

hostname
$ qsub test.pbs
107
$ cat test.pbs.o107
hal01.hal.ncsa.illinois.edu
  • No labels