Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

squeue (slurm command) → swqueue : check current running jobs and computational resource status.


Info

The Slurm Wrapper Suite is designed with people new to Slurm in mind and simplifies many aspects of job submission in favor of automation. For advanced use cases, the native Slurm commands are still available for use.


Rule of Thumb

  • Minimize the required input options.
  • Consistent with the original "slurm" run-script format.
  • Submits job to suitable partition based on the number of GPUs needed (number of nodes for CPU partition).

...

  • swrun -p <partition_name> -c <cpu_per_gpu> -t <walltime> -r <reservation_name>
    • <partition_name> (required) : cpun1, cpun2, cpun4, cpun8, gpux1, gpux2, gpux3, gpux4, gpux8, gpux12, gpux16.
    • <cpu_per_gpu> (optional) : 16 cpus (default), range from 16 cpus to 40 cpus.
    • <walltime> (optional) : 4 hours (default), range from 1 hour to 24 hours in integer format.
    • <reservation_name> (optional) : reservation name granted to user.
    • example: swrun -p gpux4 -c 40 -t 24 (request a full node: 1x node, 4x gpus, 160x cpus, 24x hours)
    • Using interactive jobs to run long-running scripts is not recommended. If you are going to walk away from your computer while your script is running, consider submitting a batch job. Unattended interactive sessions can remain idle until they run out of walltime and thus block out resources from other users. We will issue warnings when we find resource-heavy idle interactive sessions and repeated offenses may result in revocation of access rights.
  • swbatch <run_script>
    • <run_script> (required) : same as original slurm batch.
    • <job_name> (optional) : job name.
    • <output_file> (optional) : output file name.
    • <error_file> (optional) : error file name.
    • <partition_name> (required) : cpun1, cpun2, cpun4, cpun8, gpux1, gpux2, gpux3, gpux4, gpux8, gpux12, gpux16.
    • <cpu_per_gpu> (optional) : 16 cpus (default), range from 16 cpus to 40 cpus.
    • <walltime> (optional) : 24 hours (default), range from 1 hour to 24 hours in integer format.
    • <reservation_name> (optional) : reservation name granted to user.
    • example: swbatch demo.swb

      Code Block
      languagebash
      titledemo.swb
      #!/bin/bash
      
      #SBATCH --job-name="demo"
      #SBATCH --output="demo.%j.%N.out"
      #SBATCH --error="demo.%j.%N.err"
      #SBATCH --partition=gpux1
      #SBATCH --time=4
      
      srun hostname


  • swqueue
    • example: swqueue

New Job Queues (SWSuite only)

Partition NamePriorityMax WalltimeNodes
Allowed
Min-Max CPUs
Per Node Allowed
Min-Max Mem
Per Node Allowed
GPU
Allowed
Local ScratchDescription
gpux1normal24 hrs116-4019.2-48 GB1nonedesigned to access 1 GPU on 1 node to run sequential and/or parallel jobs.
gpux2normal24 hrs132-8038.4-96 GB2nonedesigned to access 2 GPUs on 1 node to run sequential and/or parallel jobs.
gpux3normal24 hrs148-12057.6-144 GB3nonedesigned to access 3 GPUs on 1 node to run sequential and/or parallel jobs.
gpux4normal24 hrs164-16076.8-192 GB4nonedesigned to access 4 GPUs on 1 node to run sequential and/or parallel jobs.
gpux8normal24 hrs264-16076.8-192 GB8nonedesigned to access 8 GPUs on 2 nodes to run sequential and/or parallel jobs.
gpux12normal24 hrs364-16076.8-192 GB12nonedesigned to access 12 GPUs on 3 nodes to run sequential and/or parallel jobs.
gpux16normal24 hrs464-16076.8-192 GB16nonedesigned to access 16 GPUs on 4 nodes to run sequential and/or parallel jobs.
cpun1normal24 hrs196-96115.2-115.2 GB0nonedesigned to access 96 CPUs on 1 node to run sequential and/or parallel jobs.
cpun2normal24 hrs296-96115.2-115.2 GB0nonedesigned to access 96 CPUs on 2 nodes to run sequential and/or parallel jobs.
cpun4normal24 hrs496-96115.2-115.2 GB0nonedesigned to access 96 CPUs on 4 nodes to run sequential and/or parallel jobs.
cpun8normal24 hrs896-96115.2-115.2 GB0nonedesigned to access 96 CPUs on 8 nodes to run sequential and/or parallel jobs.
cpun16normal24 hrs1696-96115.2-115.2 GB0nonedesigned to access 96 CPUs on 16 nodes to run sequential and/or parallel jobs.
cpu_mininormal24 hrs18-89.6-9.6 GB0nonedesigned to access 8 CPUs on 1 node to run tensorboard jobs.

...

Script Name

Job Type

Partition

Walltime

NodesCPUGPU

Memory

Description
run_gpux1_16cpu_24hrs.shinteractivegpux124 hrs116119.2 GBsubmit interactive job, 1x node for 24 hours w/ 12x CPU 1x GPU task in "gpux1" partition.
run_gpux2_32cpu_24hrs.shinteractivegpux224 hrs132238.4 GBsubmit interactive job, 1x node for 24 hours w/ 24x CPU 2x GPU task in "gpux2" partition.
sub_gpux1_16cpu_24hrs.swbbatchgpux124 hrs116119.2 GBsubmit batch job, 1x node for 24 hours w/ 12x CPU 1x GPU task in "gpux1" partition.
sub_gpux2_32cpu_24hrs.swbbatchgpux224 hrs132238.4 GBsubmit batch job, 1x node for 24 hours w/ 24x CPU 2x GPU task in "gpux2" partition.
sub_gpux4_64cpu_24hrs.swbbatchgpux424 hrs164476.8 GBsubmit batch job, 1x node for 24 hours w/ 48x CPU 4x GPU task in "gpux4" partition.
sub_gpux8_128cpu_24hrs.swbbatchgpux824 hrs21288153.6 GBsubmit batch job, 2x node for 24 hours w/ 96x CPU 8x GPU task in "gpux8" partition.
sub_gpux16_256cpu_24hrs.swbbatchgpux1624 hrs425616 153.6 GBsubmit batch job, 4x node for 24 hours w/ 192x CPU 16x GPU task in "gpux16" partition.

Native SLURM style

Available Queues

NamePriorityMax WalltimeMax NodesMin/Max CPUsMin/Max RAMMin/Max GPUsDescription
cpunormal24 hrs161-961.2GB per CPU0Designed for CPU-only jobs
gpunormal24 hrs161-1601.2GB per CPU0-64Designed for jobs utilizing GPUs
debughigh4 hrs11-1601.2GB per CPU0-4Designed for single-node, short jobs. Jobs submitted to this queue receive higher priority than other jobs of the same user.

Submit Interactive Job with "srun"

...