MILC

From Nuno:

The GPU on Jetson (GK20A) with cuda capability 3.2 also supports double precision.

Performance for a 16^4 lattice volume for gauge fixing using MILC+QUDA With overrelaxation code:

GPU: time_GK20A / time_980GTX

single ~11.2x

double ~9.7x

CPU: time_ARM / time_hybrid

single ~4.5x

double ~4.9x

With FFT:

GPU: time_GK20A / time_980GTX

single ~13.7x

double ~9.6x

The GTX980 has ~10x more cuda cores than GK20A.

Problems that I found when compiling in Jetson:

- had to remove -m32 from QUDA code

- cannot use cudaHostRegister(), cuda 6.5 toolkit release notes:

"Mapping host memory allocated outside of CUDA to device memory is not allowed on ARM; because of this, cudaHostRegister() is not supported by the CUDA driver on ARM platforms. If required, cudaHostAlloc() with the flag cudaHostAllocMapped can be used to allocate device-mapped host-accessible memory"

- compiling was a bit slow.

Best regards,

Nuno

Child pages

MILC