...
Run application on Delta
Code Block |
---|
title | nsys command line exampleswith serial or python cuda code |
---|
|
$ srun nsys profile -o /path/to/mynysys.out --stats=true ./a.out |
Code Block |
---|
title | nsys wrapper for mpi and HPC cuda codes |
---|
|
# works for simple serial cuda codes
# use this technique to profile a more complex MPI application rank (wrapper shown)
[arnoldg@dt-login03 gromacs]$ cat nsys_wrap.sh
#!/bin/bash
# Use $PMI_RANK for MPICH and $SLURM_PROCID with srun.
#if [ for MPICH, $OMPI_COMM_WORLD_RANK -eq 0 ]; then for openmpi, and $SLURM_PROCID with srun.
if [ $SLURM_PROCID -eq 1 ]; then
nsys profile -e NSYS_MPI_STORE_TEAMS_PER_RANK=1 -o gmx.nsys --gpu-metrics-set=2 "$@"
else
"$@"
fi
|
...
Code Block |
---|
title | batch script , --constraint= |
---|
|
#SBATCH --constraint=perf,nvperf
...
# the slurm script should run the wrapper above instead of "nsys ..."
time srun $SLURM_SUBMIT_DIR/nsys_wrap.sh \
gmx_mpi mdrun -nb gpu -pin on -notunepme -dlb yes -v -resethway -noconfout -nsteps 4000 -s water_pme.tpr
# see https://docs.nvidia.com/nsight-systems/UserGuide/index.html#cli-analyze-mpi-codes |
...
Copy resultant files to your local laptop ( Downloads/ or Documents/ )
scp is shown below, you could also use globus online, sftp, or an sshfs mount from your laptop.
Code Block |
---|
title | nsys output file example names |
---|
|
# Delta
[arnoldg@rgpu02 rgpu02]$ ls /tmp/nsys*
/tmp/nsys-report-988d.sqlite /tmp/nsys-report-b26d.nsys-rep
[arnoldg@rgpu02 rgpu02]$
# local laptop (MacOS example)
(base) galen@macbookair-m1-042020 ~ % cd Downloads
(base) galen@macbookair-m1-042020 Downloads % pwd
/Users/galen/Downloads
(base) galen@macbookair-m1-042020 Downloads % sftp arnoldg@rgpu02.delta.ncsa.illinois.edu
NCSA Delta System
Login with NCSA Kerberos + Duo multi-factor.
DUO Documentation: https://go.ncsa.illinois.edu/2fa
(arnoldg@rgpu02.delta.ncsa.illinois.edu) Password:
(arnoldg@rgpu02.delta.ncsa.illinois.edu) Duo two-factor login for arnoldg
Enter a passcode or select one of the following options:
1. Duo Push to XXX-XXX-1120
2. Duo Push to Ipad mini (iOS)
3. Duo Push to red ipod (iOS)
Passcode or option (1-3): 1
Connected to rgpu02.delta.ncsa.illinois.edu.
sftp> cd /tmp
sftp> mget nsys*
Fetching /tmp/nsys-report-988d.sqlite to nsys-report-988d.sqlite
/tmp/nsys-report-988d.sqlite 100% 748KB 2.7MB/s 00:00
Fetching /tmp/nsys-report-b26d.nsys-rep to nsys-report-b26d.nsys-rep
/tmp/nsys-report-b26d.nsys-rep 100% 288KB 1.7MB/s 00:00
sftp> |
...
Code Block |
---|
title | installing nvtx via pip |
---|
|
[arnoldg@rgpu02 nvtx]$ spackmodule load python cuda
[arnoldg@rgpu02 nvtx]$ C_INCLUDE_PATH=$CUDA_HOME/include pip install nvtx
Collecting nvtx
Using cached nvtx-0.2.3.tar.gz (10 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: nvtx
Building wheel for nvtx (pyproject.toml) ... done
Created wheel for nvtx: filename=nvtx-0.2.3-cp39-cp39-linux_x86_64.whl size=177533 sha256=875e0f9d4322d07db4bce397b4281ce301f348cf72e00629b0d7bc23a7db0231
Stored in directory: /u/arnoldg/.cache/pip/wheels/66/7a/44/68c48f02433263010768b540b0e90bf5a224dd7e6612d88887
Successfully built nvtx
Installing collected packages: nvtx
Successfully installed nvtx-0.2.3
[arnoldg@rgpu02 nvtx]$ |
...
Code Block |
---|
nsys profile --gpu-metrics-device=all \
--gpu-metrics-frequency=20000 <application> # get metrics from the cuda libs/api
ncu --metrics "regex:.*" <application> # get all gpu metrics from the hardware (not yet working on Delta ) |
Delta script and nsight-systems view of the resulting report
...