You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Introduction

We have prepared a hal-login3 machine as a login node so that users can request computational resources from hal-dgx and overdrive.

How to login hal-login3

ssh <user_id>@hal-login3.ncsa.illinois.edu

Type sinfo to check the existing partitions

[dmu@hal-login3 ~]$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
arm          up 15-00:00:0      1   idle overdrive
x86*         up 15-00:00:0      1   idle hal-dgx

Note: hal-login3 has no shared file system. Therefore, you can not find the same layout among these three machines.

Rules

  1. the maximum wall time for each job is 48 hours
  2. the maximum GPU one user can request is 4x.

Access to hal-dgx

You need to submit an interactive job and/or batch script to request some resources to run your jobs.

1. Interactive

Request 1x GPU along with 32x CPU cores for 4 hours

srun --partition=x86 --time=4:00:00 --nodes=1 --ntasks-per-node=32 --sockets-per-node=1 --cores-per-socket=16 --threads-per-core=2 --mem-per-cpu=4000 --wait=0 --export=ALL --gres=gpu:a100:1 --pty /bin/bash

Request 2x GPU along with 64x CPU cores for 12 hours

srun --partition=x86 --time=12:00:00 --nodes=1 --ntasks-per-node=64 --sockets-per-node=1 --cores-per-socket=32 --threads-per-core=2 --mem-per-cpu=4000 --wait=0 --export=ALL --gres=gpu:a100:2 --pty /bin/bash

Request 4x GPU along with 128x CPU cores for 24 hours

srun --partition=x86 --time=24:00:00 --nodes=1 --ntasks-per-node=128 --sockets-per-node=1 --cores-per-socket=64 --threads-per-core=2 --mem-per-cpu=4000 --wait=0 --export=ALL --gres=gpu:a100:4 --pty /bin/bash

2. Batch script

#!/bin/bash
#SBATCH --job-name="example"
#SBATCH --output="example.%j.%N.out"
#SBATCH --partition=x86
#SBATCH --time=1:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=32
#SBATCH --sockets-per-node=1
#SBATCH --cores-per-socket=16
#SBATCH --threads-per-core=2
#SBATCH --mem-per-cpu=4000
#SBATCH --gres=gpu:a100:1
#SBATCH --export=ALL

cd ~

echo STARTING `date`

srun hostname

Access to overdrive

You need to submit an interactive job and/or batch script to request some resources to run your jobs.

1. Interactive

Request 1x GPU along with 40x CPU cores for 4 hours

srun --partition=arm --time=4:00:00 --nodes=1 --ntasks-per-node=40 --sockets-per-node=1 --cores-per-socket=40 --threads-per-core=1 --mem-per-cpu=3200 --wait=0 --export=ALL --gres=gpu:a100:1 --pty /bin/bash

Request 2x GPU along with 40x CPU cores for 4 hours

srun --partition=arm --time=4:00:00 --nodes=1 --ntasks-per-node=80 --sockets-per-node=1 --cores-per-socket=80 --threads-per-core=1 --mem-per-cpu=3200 --wait=0 --export=ALL --gres=gpu:a100:2 --pty /bin/bash

2. Batch script

#!/bin/bash
#SBATCH --job-name="example"
#SBATCH --output="example.%j.%N.out"
#SBATCH --partition=arm
#SBATCH --time=1:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=40
#SBATCH --sockets-per-node=1
#SBATCH --cores-per-socket=40
#SBATCH --threads-per-core=1
#SBATCH --mem-per-cpu=3200
#SBATCH --gres=gpu:a100:1
#SBATCH --export=ALL

cd ~

echo STARTING `date`

srun hostname
  • No labels