Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Run application on Delta

Code Block
titlensys command line exampleswith serial or python cuda code
$ srun nsys profile -o /path/to/mynysys.out --stats=true ./a.out


Code Block
titlensys wrapper for mpi and HPC cuda codes
   # works for simple serial cuda codes

# use this technique to profile a more complex MPI application rank (wrapper shown)

[arnoldg@dt-login03 gromacs]$ cat nsys_wrap.sh 
#!/bin/bash

# Use $PMI_RANK for MPICH and $SLURM_PROCID with srun.
#if [ for MPICH, $OMPI_COMM_WORLD_RANK -eq 0 ]; then for openmpi, and $SLURM_PROCID with srun.
if [ $SLURM_PROCID -eq 1 ]; then
  nsys profile -e NSYS_MPI_STORE_TEAMS_PER_RANK=1 -o gmx.nsys --gpu-metrics-set=2 "$@"
else
  "$@"
fi

...

Code Block
titlebatch script , --constraint=
#SBATCH --constraint=perf,nvperf
...
# the slurm script should run the wrapper above instead of "nsys ..."
time srun $SLURM_SUBMIT_DIR/nsys_wrap.sh \
  gmx_mpi mdrun -nb gpu -pin on -notunepme -dlb yes -v -resethway -noconfout -nsteps 4000 -s water_pme.tpr

# see https://docs.nvidia.com/nsight-systems/UserGuide/index.html#cli-analyze-mpi-codes

...

Copy resultant files to your local laptop ( Downloads/ or Documents/ )

scp is shown below, you could also use globus online, sftp, or an sshfs mount from your laptop.

Code Block
titlensys output file example names
# Delta
[arnoldg@rgpu02 rgpu02]$ ls /tmp/nsys*
/tmp/nsys-report-988d.sqlite  /tmp/nsys-report-b26d.nsys-rep
[arnoldg@rgpu02 rgpu02]$ 

# local laptop (MacOS example)
(base) galen@macbookair-m1-042020 ~ % cd Downloads
(base) galen@macbookair-m1-042020 Downloads % pwd
/Users/galen/Downloads
(base) galen@macbookair-m1-042020 Downloads % sftp arnoldg@rgpu02.delta.ncsa.illinois.edu

NCSA Delta System

Login with NCSA Kerberos + Duo multi-factor.

DUO Documentation:  https://go.ncsa.illinois.edu/2fa

(arnoldg@rgpu02.delta.ncsa.illinois.edu) Password: 
(arnoldg@rgpu02.delta.ncsa.illinois.edu) Duo two-factor login for arnoldg

Enter a passcode or select one of the following options:

 1. Duo Push to XXX-XXX-1120
 2. Duo Push to Ipad mini (iOS)
 3. Duo Push to red ipod (iOS)

Passcode or option (1-3): 1
Connected to rgpu02.delta.ncsa.illinois.edu.
sftp> cd /tmp
sftp> mget nsys*
Fetching /tmp/nsys-report-988d.sqlite to nsys-report-988d.sqlite
/tmp/nsys-report-988d.sqlite                  100%  748KB   2.7MB/s   00:00    
Fetching /tmp/nsys-report-b26d.nsys-rep to nsys-report-b26d.nsys-rep
/tmp/nsys-report-b26d.nsys-rep                100%  288KB   1.7MB/s   00:00    
sftp> 

...

Code Block
titleinstalling nvtx via pip
[arnoldg@rgpu02 nvtx]$ spackmodule load python cuda
[arnoldg@rgpu02 nvtx]$ C_INCLUDE_PATH=$CUDA_HOME/include pip install nvtx
Collecting nvtx
  Using cached nvtx-0.2.3.tar.gz (10 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: nvtx
  Building wheel for nvtx (pyproject.toml) ... done
  Created wheel for nvtx: filename=nvtx-0.2.3-cp39-cp39-linux_x86_64.whl size=177533 sha256=875e0f9d4322d07db4bce397b4281ce301f348cf72e00629b0d7bc23a7db0231
  Stored in directory: /u/arnoldg/.cache/pip/wheels/66/7a/44/68c48f02433263010768b540b0e90bf5a224dd7e6612d88887
Successfully built nvtx
Installing collected packages: nvtx
Successfully installed nvtx-0.2.3
[arnoldg@rgpu02 nvtx]$ 

...

Code Block
nsys profile --gpu-metrics-device=all \
	--gpu-metrics-frequency=20000 <application>   # get metrics from the cuda libs/api

ncu --metrics "regex:.*" <application>   # get all gpu metrics from the hardware (not yet working on Delta )

Delta script and nsight-systems view of the resulting report

...