Page History

...

The compiler wrappers enable linking of a libmpi_gtl_cuda library that enables gpu-rdma with the Cray MPI.

Running a CrayPE job

See the Running jobs section above for details on the partitions etc.

GPU direct support

These MPI implementations should be used only when mpi + cuda/gpu_direct are needed in an application. The pure-mpi performance will be less than the MPI implementations above for small message sizes. For large messages, the performance should be close to equivalent to the cpu-only implementations.

openmpi

choose one of:

Code Block

module load gcc openmpi/4.1.5+cuda     # the default gcc/11.4.0
module load nvhpc openmpi/4.1.5+cuda   # will load the openmpi/4.1.5+cuda built with nvhpc compilers

# in testing mode
module load gcc openmpi/5.0.1+cuda     # only mpirun is supported, do not use with srun

Running a CrayPE job

See the Running jobs section above for details on the partitions etc.

Code Block

[gbauer@dt-login04 ~]$ module unload openmpi gcc 
[gbauer@dt-login04 ~]$ module load PrgEnv-gnu cuda craype-x86-milan craype-accel-ncsa
[gbauer@dt-login04 ~]$ srun --account=bbka-delta-gpu --partition=gpuA40x4 --nodes=2 --ntasks-per-node=2 --cpus-per-task=2 --gpus-per-task=1 --mem=0 --time=00:10:00 ./xthi
srun: job 2735921 queued and waiting for resources
srun: job 2735921 has been allocated resources
Rank 0

Code Block

[gbauer@dt-login04 ~]$ module unload openmpi gcc 
[gbauer@dt-login04 ~]$ module load PrgEnv-gnu cuda craype-x86-milan craype-accel-ncsa
[gbauer@dt-login04 ~]$ srun --account=bbka-delta-gpu --partition=gpuA40x4 --nodes=2 --ntasks-per-node=2 --cpus-per-task=2 --gpus-per-task=1 --mem=0 --time=00:10:00 ./xthi
srun: job 2735921 queued and waiting for resources
srun: job 2735921 has been allocated resources
Rank 0, thread 0, on gpub003.delta.ncsa.illinois.edu. core = 0,1,(6.548536 seconds).
Rank 0, thread 1, on gpub003.delta.ncsa.illinois.edu. core = 0,1,(6.548521 seconds).
Rank 1, thread 1, on gpub003.delta.ncsa.illinois.edu. core = 2,3,(18.908121 seconds).
Rank 1, thread 0, on gpub003.delta.ncsa.illinois.edu. core = 20,31,(186.908134548536 seconds).
Rank 20, thread 01, on gpub004gpub003.delta.ncsa.illinois.edu. core = 0,1,(106.076774548521 seconds).
Rank 21, thread 1, on gpub004gpub003.delta.ncsa.illinois.edu. core = 02,13,(1018.076761908121 seconds).
Rank 31, thread 0, on gpub004gpub003.delta.ncsa.illinois.edu. core = 2,3,(1618.366058908134 seconds).
Rank 3, thread 1, on gpub004.delta.ncsa.illinois.edu. core = 2,3,(16.366045 seconds).

gpu direct support

...

openmpi

choose one of:

Code Block

module load gcc openmpi/4.1.5+cuda     # the default gcc/11.4.0
module load nvhpc openmpi/4.1.5+cuda   # will load the openmpi/4.1.5+cuda built with nvhpc compilers

# in testing mode
module load gcc openmpi/5.0.1+cuda     # only mpirun is supported, do not use with srun

...

 thread 0, on gpub004.delta.ncsa.illinois.edu. core = 0,1,(10.076774 seconds).
Rank 2, thread 1, on gpub004.delta.ncsa.illinois.edu. core = 0,1,(10.076761 seconds).
Rank 3, thread 0, on gpub004.delta.ncsa.illinois.edu. core = 2,3,(16.366058 seconds).
Rank 3, thread 1, on gpub004.delta.ncsa.illinois.edu. core = 2,3,(16.366045 seconds).

Cray Programming Environments

...

Page tree

Versions Compared

Old Version 76

New Version 77

Key

Running a CrayPE job

GPU direct support

openmpi

Running a CrayPE job

gpu direct support

openmpi

Cray Programming Environments