Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

IBM PowerAI 1.6.0

PowerAI is an enterprise software distribution that combines popular open source deep learning frameworks, efficient AI development tools, and accelerated IBM Power Systems servers. It includes the following frameworks:

FrameworkVersionDescription
Caffe1.0Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research and by community contributors.
Caffe2n/aCaffe2 is a companion to PyTorch. PyTorch is great for experimentation and rapid development, while Caffe2 is aimed at production environments.
Pytorch1.0.1Pytorch is an open source deep learning platform that provides a seamless path from research prototyping to production deployment. It is developed by Facebook and by community contributors.
TensorFlow1.13.1TensorFlow is an end-to-end open source platform for machine learning. It is developed by Google and by community contributors.

For complete PowerAI documentation, see https://www.ibm.com/support/knowledgecenter/SS5SF7_1.6.0/navigation/pai_getstarted.htm. Here we only show simple examples with system-specific instructions.

Major Anaconda Modules

NameVersionDescription
caffe1.0Caffe is a deep learning framework made with expression, speed, and modularity in mind.
cudatoolkit10.1.105

The NVIDIA® CUDA® Toolkit provides a development environment for creating high performance GPU-accelerated applications.

cudnn7.5.0+10.1

The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks.

h5py2.8.0The h5py package is a Pythonic interface to the HDF5 binary data format.
jupyter1.0.0Project Jupyter exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages.
matplotlib2.2.3Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.
nccl2.4.2The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective communication primitives that are performance optimized for NVIDIA GPUs.
numpy1.14.5NumPy is the fundamental package for scientific computing with Python.
opencv3.4.2OpenCV was designed for computational efficiency and with a strong focus on real-time applications.
pytables3.4.4PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data.
pytorch1.0.1PyTorch enables fast, flexible experimentation and efficient production through a hybrid front-end, distributed training, and ecosystem of tools and libraries.
scikit-learn0.19.1Simple and efficient tools for data mining and data analysis.
scipy1.1.0SciPy is a Python-based ecosystem of open-source software for mathematics, science, and engineering
tensorboard1.13.0To make it easier to understand, debug, and optimize TensorFlow programs, we've included a suite of visualization tools called TensorBoard.
tensorflow-gpu1.13.1The core open source library to help you develop and train ML models.
torchvision0.2.1The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision.

Simple Example with Caffe

Interactive mode

Get node for interactive use:

Code Block
srun --partition=debug --pty --nodes=1 --ntasks-per-node=8 --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bash

Once on the compute node, load PowerAI module using one of these:

Code Block
module load powerai/1.6.0-py2.7 # for python2 environment
module load powerai/1.6.0-py3.6 # for python3 environment
module load powerai           # python3 environment by default

Install samples for Caffe:

Code Block
caffe-install-samples ~/caffe-samples
cd ~/caffe-samples

Download data for MNIST model:

Code Block
./data/mnist/get_mnist.sh

Convert data and create MNIST model:

Code Block
./examples/mnist/create_mnist.sh

Train LeNet on MNIST:

Code Block
./examples/mnist/train_lenet.sh

Batch mode

The same can be accomplished in batch mode using the following caffe_sample.sb script:

Code Block
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/caffe_sample.sb
sbatch caffe_sample.sb
squeue

Simple Example with Caffe2

Interactive mode

Get node for interactive use:

Code Block
srun --partition=debug --pty --nodes=1 --ntasks-per-node=8 --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bash

Once on the compute node, load PowerAI module using one of these:

Code Block
module load powerai/1.6.0-py2.7 # for python2 environment
module load powerai/1.6.0-py3.6 # for python3 environment
module load powerai           # python3 environment by default

Install samples for Caffe2:

Code Block
caffe2-install-samples ~/caffe2-samples
cd ~/caffe2-samples

Download data with LMDB:

Code Block
python ./examples/lmdb_create_example.py --output_file lmdb

Train ResNet50 with Caffe2:

Code Block
python ./examples/resnet50_trainer.py --train_data ./lmdb

Batch mode

The same can be accomplished in batch mode using the following caffe2_sample.sb script:

Code Block
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/caffe2_sample.sb
sbatch caffe2_sample.sb
squeue

Simple Example with TensorFlow

Interactive mode

Get node for interactive use:

Code Block
srun --partition=debug --pty --nodes=1 --ntasks-per-node=8 --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bash

Once on the compute node, load PowerAI module using one of these:

Code Block
module load powerai/1.6.0-py2.7 # for python2 environment
module load powerai/1.6.0-py3.6 # for python3 environment
module load powerai           # python3 environment by default

Copy the following code into file "mnist-demo.py":

Code Block
import tensorflow as tf
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)

Train on MNIST with keras API:

Code Block
python ./mnist-demo.py

Batch mode

The same can be accomplished in batch mode using the following tf_sample.sb script:

Code Block
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/tf_sample.sb
sbatch tf_sample.sb
squeue

Visualization with TensorBoard

Interactive mode

Get node for interactive use:

Code Block
srun --partition=debug --pty --nodes=1 --ntasks-per-node=8 --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bash

Once on the compute node, load PowerAI module using one of these:

Code Block
module load powerai/1.6.0-py2.7 # for python2 environment
module load powerai/1.6.0-py3.6 # for python3 environment
module load powerai           # python3 environment by default

Download the code mnist-with-summaries.py to $HOME folder:

Code Block
cd ~
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/mnist-with-summaries.py

Train on MNIST with TensorFlow summary and go back to login node:

Code Block
python ./mnist-with-summaries.py
exit

Batch mode

The same can be accomplished in batch mode using the following tfbd_sample.sb script:

Code Block
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/tfbd_sample.sb
sbatch tfbd_sample.sb
squeue

Start the TensorBorad session

After job completed the TensorFlow log files can be found in "~/tensorflow/mnist/logs", start the TensorBoard server on login node:

Code Block
module load powerai
tensorboard --logdir ~/tensorflow/mnist/logs/ --port [user_pick_port] # please use random number within [6500-6999]

Forward the [user_pick_port] on remote machine to the port 16006 on local machine:

Code Block
ssh -N -f -L localhost:16006:localhost:[user_pick_port] your_user_name@hal.ncsa.illinois.edu

Paste the follow address into web browser to start the TensorBoard session:

Code Block
localhost:16006

Simple Example with Pytorch

Interactive mode

Get node for interactive use:

Code Block
srun --partition=debug --pty --nodes=1 --ntasks-per-node=8 --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bash

Once on the compute node, load PowerAI module using one of these:

Code Block
module load powerai/1.6.0-py2.7 # for python2 environment
module load powerai/1.6.0-py3.6 # for python3 environment
module load powerai           # python3 environment by default

Install samples for Pytorch:

Code Block
pytorch-install-samples ~/pytorch-samples
cd ~/pytorch-samples

Train on MNIST with Pytorch:

Code Block
python ./examples/mnist/main.py

Batch mode

The same can be accomplished in batch mode using the following pytorch_sample.sb script:

Code Block
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/pytorch_sample.sb
sbatch pytorch_sample.sb
squeue