ImageNet Distributed Mixed-precision Training with PyTorch
References
Source code: https://github.com/richardkxu/distributed-pytorch
Overview
We will cover the following training methods for PyTorch:
- regular, single node, single GPU training
torch.nn.DataParallel
torch.nn.DistributedDataParallel
- mixed precision training with NVIDIA
Apex
TensorBoard
logging under distributed training context
We will cover the following use cases:
- Single node single GPU training
- Single node multi-GPU training
- Multi-node multi-GPU training