...
Code Block | ||
---|---|---|
| ||
$ squeue -u $USER JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) FEATURES 2734871 cpu bash gbauer R 1:47 2 cn[122-123] ss11 |
GPU direct support
These MPI implementations should be used only when mpi + cuda/gpu_direct are needed in an application. The pure-mpi performance will be less than the MPI implementations above for small message sizes. For large messages, the performance should be close to equivalent to the cpu-only implementations.
openmpi
choose one of:
Code Block |
---|
module load gcc openmpi/4.1.5+cuda # the default gcc/11.4.0
module load nvhpc openmpi/4.1.5+cuda # will load the openmpi/4.1.5+cuda built with nvhpc compilers
# in testing mode
module load gcc openmpi/5.0.1+cuda # only mpirun is supported, do not use with srun |
see also: gpudirect s10 vs s11 performance
CrayPE Programming Environments
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
------------------------ /opt/cray/pe/lmod/modulefiles/craype-targets/default ------------------------ craype-accel-amd-gfx908 craype-hugepages128M craype-hugepages512M craype-x86-milan craype-accel-amd-gfx90a craype-hugepages16M craype-hugepages64M craype-x86-rome craype-accel-amd-gfx940 craype-hugepages1G craype-hugepages8M craype-x86-spr-hbm craype-accel-host craype-hugepages256M craype-network-none craype-x86-spr craype-accel-intel-max craype-hugepages2G craype-network-ofi craype-x86-trento------------------- /opt/cray/pe/lmod/modulefiles/craype-targets/default ------------------------ craype-accel-amd-nvidia70gfx908 craype-hugepages128M craype-hugepages2Mhugepages512M craype-networkx86-ucxmilan craype-accel-nvidia80 -amd-gfx90a craype-hugepages16M craype-hugepages32Mhugepages64M craype-x86-genoarome craype-accel-armamd-gracegfx940 craype-hugepages1G craype-hugepages4Mhugepages8M craype-x86-milanspr-x --------------------------------- /opt/cray/pe/lmod/modulefiles/core --------------------------------- PrgEnv-cray/8.4.0 cray-R/4.2.1.2hbm craype-accel-host craype-hugepages256M craype-network-none craype-x86-spr craype-accel-intel-max craype-hugepages2G craype-network-ofi craype-x86-trento craype-accel-nvidia70 craype-hugepages2M craype-network-ucx craype-accel-nvidia80 craype-hugepages32M cray-pals/1.2.12craype-x86-genoa craype-arm-grace gdb4hpc/4.15.1 PrgEnv-gnu/8.4.0craype-hugepages4M cray-ccdb/5.0.1 cray-pmi/6.1.12 papi/7.0.1.1craype-x86-milan-x --------------------------------- /opt/cray/pe/lmod/modulefiles/core --------------------------------- PrgEnv-nvhpccray/8.4.0 cray-ctiR/4.2.181.12 cray-pythonpals/31.10.102.12 perftools-basegdb4hpc/234.0915.01 PrgEnv-nvidiagnu/8.4.0 cray-dsmmlccdb/5.0.2.21 cray-statpmi/46.1.12.1 sanitizers4hpcpapi/17.0.1.1 atpPrgEnv-nvhpc/38.15.1 4.0 cray-dyninstcti/122.318.0 1 craype/2.7.23 cray-python/3.10.10 valgrind4hpcperftools-base/223.1309.10 cce/16PrgEnv-nvidia/8.4.0.1 cray-dsmml/0.2.2 cray-libpalsstat/14.2.1212.1 craypkg-gensanitizers4hpc/1.31.301 cpe-cuda/23.09atp/3.15.1 cray-libscidyninst/2312.09.1.1 3.0 gcc-native/10.3 cpecraype/2.7.23.09 cray-mrnet/5.1.1valgrind4hpc/2.13.1 cce/16.0.1 gcc cray-nativelibpals/111.2 (D) |
Using a Cray programming environment
Here is how you can enable the GNU CrayPE programming environment.
Code Block |
---|
[gbauer@dt-login04 ~]$ module unload openmpi gcc
[gbauer@dt-login04 ~]$ module load PrgEnv-gnu cuda craype-x86-milan craype-accel-ncsa |
Compiling
Again, you will need to use the cc, CC and ftn compiler wrappers.
Code Block | ||
---|---|---|
| ||
# Use the HPE/Cray compiler wrappers cc, CC and ftn to compile and link
# you might need to add libraries manually
[gbauer@dt-login04 ~]$ cc -fopenmp -o xthi xthi.c -lcuda -lcudart |
The compiler wrappers enable linking of a libmpi_gtl_cuda library that enables gpu-rdma with the Cray MPI.
GPU direct support
These MPI implementations should be used only when mpi + cuda/gpu_direct are needed in an application. The pure-mpi performance will be less than the MPI implementations above for small message sizes. For large messages, the performance should be close to equivalent to the cpu-only implementations.
openmpi
choose one of:
Code Block |
---|
module load gcc openmpi/4.1.5+cuda # the default gcc/11.4.0
module load nvhpc openmpi/4.1.5+cuda # will load the openmpi/4.1.5+cuda built with nvhpc compilers
# in testing mode
module load gcc openmpi/5.0.1+cuda # only mpirun is supported, do not use with srun |
.12 craypkg-gen/1.3.30
cpe-cuda/23.09 cray-libsci/23.09.1.1 gcc-native/10.3
cpe/23.09 cray-mrnet/5.1.1 gcc-native/11.2 (D) |
Using a Cray programming environment
Here is how you can enable the GNU CrayPE programming environment.
Code Block |
---|
[gbauer@dt-login04 ~]$ module unload openmpi gcc
[gbauer@dt-login04 ~]$ module load PrgEnv-gnu cuda craype-x86-milan craype-accel-ncsa |
Compiling
Again, you will need to use the cc, CC and ftn compiler wrappers.
Code Block | ||
---|---|---|
| ||
# Use the HPE/Cray compiler wrappers cc, CC and ftn to compile and link
# you might need to add libraries manually
[gbauer@dt-login04 ~]$ cc -fopenmp -o xthi xthi.c -lcuda -lcudart |
The compiler wrappers enable linking of a libmpi_gtl_cuda library that enables gpu-rdma with the Cray MPI.see also: gpudirect s10 vs s11 performance
Running a CrayPE job
See the Running jobs section above for details on the partitions etc.
...