ImageNet Distributed Mixed-precision Training Benchmark

Github repo for all source code and details: https://github.com/richardkxu/distributed-pytorch

Jupyter notebook tutorial for the key pointsSource code and tutorial: https://github.com/richardkxu/distributed-pytorch/blob/master/ddp_apex_tutorial.ipynb

HAL paper: coming up soon!: https://dl.acm.org/doi/10.1145/3311790.3396649

Benchmark Results

Training Time: Time to solution during training. The number of GPUs ranges from 2 GPUs to 64 GPUs. ImageNet training with ResNet-50 using 2 GPUs takes 20.00 hrs, 36.00 mins, 51.11 secs. With 64 GPUs across 16 compute nodes, we can train ResNet-50 in 1.00 hr, 7.00 mins, 51.31 secs, while maintaining the same level of top1 and top5 accuracy.

...

I/O Bandwidth: I/O Bandwidth (GB/s) and IOPS of our file system throughout our full system ImageNet training using 64 GPUs. Between 10th and 60th epoch, the average bandwidth is 3.30 GB/s and the average IOPS is 36.5K.

Software Stack

IBM WMLCE 1.6.2
Python 3.7
PyTorch 1.2.0
NVIDIA Apex 0.1.0
CUDA 10.1

...

Child pages

Versions Compared

Old Version 12

New Version Current

Key

ImageNet Distributed Mixed-precision Training Benchmark

Benchmark Results

Software Stack

Child pages

Page History

Versions Compared

Old Version 12

New Version Current

Key

ImageNet Distributed Mixed-precision Training Benchmark

Benchmark Results

Software Stack