Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

IBM

...

Watson Machine Learning Community Edition (WMLCE-1.6.

...

1)

PowerAI is an enterprise software distribution that combines popular open-source deep learning frameworks, efficient AI development tools, and accelerated IBM Power Systems servers. It includes the following frameworks:

FrameworkVersionDescription
Caffe1.0Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research and by community contributors.
Caffe2n/aCaffe2 is a companion to PyTorch. PyTorch is great for experimentation and rapid development, while Caffe2 is aimed at production environments.
Pytorch1.1.0.1Pytorch is an open-source deep learning platform that provides a seamless path from research prototyping to production deployment. It is developed by Facebook and by community contributors.
TensorFlow1.1314.10TensorFlow is an end-to-end open-source platform for machine learning. It is developed by Google and by community contributors.

For complete PowerAI documentation, see https://www.ibm.com/support/knowledgecenter/SS5SF7_1.6.0/navigation/pai_getstarted.htm. Here we only show simple examples with system-specific instructions.

Simple Example with Caffe

Interactive mode

Get one compute node for interactive use:

Code Block
srunswrun --partition=debug --pty --nodes=1 --ntasks-per-node=8 --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bashp gpux1

Once on the compute node, load PowerAI module using one of these:

Code Block
module load powerai/1.6.0-py2.7 # for python2 environment
module load powerai/1.6.0-py3.6 # for python3 environment
module load powerai             # python3 environment by default

...

Code Block
./examples/mnist/train_lenet.sh

Batch mode

The same can be accomplished in batch mode using the following caffe_sample.sb script:

Code Block
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/caffe_sample.sb
sbatch caffe_sample.sb
squeue

Simple Example with Caffe2

Interactive mode

Get node for interactive use:

...

Code Block
python ./examples/resnet50_trainer.py --train_data ./lmdb

Batch mode

The same can be accomplished in batch mode using the following caffe2_sample.sb script:

Code Block
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/caffe2_sample.sb
sbatch caffe2_sample.sb
squeue

Simple Example with TensorFlow

Interactive mode

Get node for interactive use:

...

Code Block
python ./mnist-demo.py

Batch mode

The same can be accomplished in batch mode using the following tf_sample.sb script:

Code Block
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/tf_sample.sb
sbatch tf_sample.sb
squeue

Visualization with TensorBoard

Interactive mode

Get node for interactive use:

...

Code Block
python ./mnist-with-summaries.py
exit

Batch mode

The same can be accomplished in batch mode using the following tfbd_sample.sb script:

Code Block
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/tfbd_sample.sb
sbatch tfbd_sample.sb
squeue

Start the TensorBorad session

After job completed the TensorFlow log files can be found in "~/tensorflow/mnist/logs", start the TensorBoard server on login node:

...

Code Block
localhost:16006

Simple Example with Pytorch

Interactive mode

Get node for interactive use:

...

Code Block
python ./examples/mnist/main.py

Batch mode

The same can be accomplished in batch mode using the following pytorch_sample.sb script:

Code Block
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/pytorch_sample.sb
sbatch pytorch_sample.sb
squeue

Major Installed PowerAI Related Anaconda Modules

NameVersionDescription
caffe1.0Caffe is a deep learning framework made with expression, speed, and modularity in mind.
cudatoolkit10.1.105

The NVIDIA® CUDA® Toolkit provides a development environment for creating high performance GPU-accelerated applications.

cudnn7.5.0+10.1

The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks.

h5py2.8.0The h5py package is a Pythonic interface to the HDF5 binary data format.
jupyter1.0.0Project Jupyter exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages.
matplotlib2.2.3Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.
nccl2.4.2The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective communication primitives that are performance optimized for NVIDIA GPUs.
numpy1.14.5NumPy is the fundamental package for scientific computing with Python.
opencv3.4.2OpenCV was designed for computational efficiency and with a strong focus on real-time applications.
pytables3.4.4PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data.
pytorch1.0.1PyTorch enables fast, flexible experimentation and efficient production through a hybrid front-end, distributed training, and ecosystem of tools and libraries.
scikit-learn0.19.1Simple and efficient tools for data mining and data analysis.
scipy1.1.0SciPy is a Python-based ecosystem of open-source software for mathematics, science, and engineering
tensorboard1.13.0To make it easier to understand, debug, and optimize TensorFlow programs, we've included a suite of visualization tools called TensorBoard.
tensorflow-gpu1.13.1The core open source library to help you develop and train ML models.
torchvision0.2.1The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision.

...