You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 35 Next »

IBM PowerAI 1.6.0

PowerAI is an enterprise software distribution that combines popular open source deep learning frameworks, efficient AI development tools, and accelerated IBM Power Systems servers. It includes the following frameworks:

FrameworkVersionDescription
Caffe1.0Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research and by community contributors.
Caffe2n/aCaffe2 is a companion to PyTorch. PyTorch is great for experimentation and rapid development, while Caffe2 is aimed at production environments.
Pytorch1.0.1Pytorch is an open source deep learning platform that provides a seamless path from research prototyping to production deployment. It is developed by Facebook and by community contributors.
TensorFlow1.13.1TensorFlow is an end-to-end open source platform for machine learning. It is developed by Google and by community contributors.

For complete PowerAI documentation, see https://www.ibm.com/support/knowledgecenter/SS5SF7_1.6.0/navigation/pai_getstarted.htm. Here we only show simple examples with system-specific instructions.

Major Anaconda Modules

NameVersionDescription
caffe1.0Caffe is a deep learning framework made with expression, speed, and modularity in mind.
cudatoolkit10.1.105

The NVIDIA® CUDA® Toolkit provides a development environment for creating high performance GPU-accelerated applications.

cudnn7.5.0+10.1

The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks.

h5py2.8.0The h5py package is a Pythonic interface to the HDF5 binary data format.
jupyter1.0.0Project Jupyter exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages.
matplotlib2.2.3Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.
nccl2.4.2The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective communication primitives that are performance optimized for NVIDIA GPUs.
numpy1.14.5NumPy is the fundamental package for scientific computing with Python.
opencv3.4.2OpenCV was designed for computational efficiency and with a strong focus on real-time applications.
pytables3.4.4PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data.
pytorch1.0.1PyTorch enables fast, flexible experimentation and efficient production through a hybrid front-end, distributed training, and ecosystem of tools and libraries.
scikit-learn0.19.1Simple and efficient tools for data mining and data analysis.
scipy1.1.0SciPy is a Python-based ecosystem of open-source software for mathematics, science, and engineering
tensorboard1.13.0To make it easier to understand, debug, and optimize TensorFlow programs, we've included a suite of visualization tools called TensorBoard.
tensorflow-gpu1.13.1The core open source library to help you develop and train ML models.
torchvision0.2.1The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision.

Simple Example with Caffe

Interactive mode

Get node for interactive use:

srun --partition=debug --pty --nodes=1 --ntasks-per-node=8 --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bash

Once on the compute node, load PowerAI module using one of these:

module load powerai/1.6.0-py2.7 # for python2 environment
module load powerai/1.6.0-py3.6 # for python3 environment
module load powerai           # python3 environment by default

Install samples for Caffe:

caffe-install-samples ~/caffe-samples
cd ~/caffe-samples

Download data for MNIST model:

./data/mnist/get_mnist.sh

Convert data and create MNIST model:

./examples/mnist/create_mnist.sh

Train LeNet on MNIST:

./examples/mnist/train_lenet.sh

Batch mode

The same can be accomplished in batch mode using the following caffe_sample.sb script:

wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/caffe_sample.sb
sbatch caffe_sample.sb
squeue

Simple Example with Caffe2

Interactive mode

Get node for interactive use:

srun --partition=debug --pty --nodes=1 --ntasks-per-node=8 --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bash

Once on the compute node, load PowerAI module using one of these:

module load powerai/1.6.0-py2.7 # for python2 environment
module load powerai/1.6.0-py3.6 # for python3 environment
module load powerai           # python3 environment by default

Install samples for Caffe2:

caffe2-install-samples ~/caffe2-samples
cd ~/caffe2-samples

Download data with LMDB:

python ./examples/lmdb_create_example.py --output_file lmdb

Train ResNet50 with Caffe2:

python ./examples/resnet50_trainer.py --train_data ./lmdb

Batch mode

The same can be accomplished in batch mode using the following caffe2_sample.sb script:

wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/caffe2_sample.sb
sbatch caffe2_sample.sb
squeue

Simple Example with TensorFlow

Interactive mode

Get node for interactive use:

srun --partition=debug --pty --nodes=1 --ntasks-per-node=8 --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bash

Once on the compute node, load PowerAI module using one of these:

module load powerai/1.6.0-py2.7 # for python2 environment
module load powerai/1.6.0-py3.6 # for python3 environment
module load powerai           # python3 environment by default

Copy the following code into file "mnist-demo.py":

import tensorflow as tf
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)

Train on MNIST with keras API:

python ./mnist-demo.py

Batch mode

The same can be accomplished in batch mode using the following tf_sample.sb script:

wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/tf_sample.sb
sbatch tf_sample.sb
squeue

Visualization with TensorBoard

Interactive mode

Get node for interactive use:

srun --partition=debug --pty --nodes=1 --ntasks-per-node=8 --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bash

Once on the compute node, load PowerAI module using one of these:

module load powerai/1.6.0-py2.7 # for python2 environment
module load powerai/1.6.0-py3.6 # for python3 environment
module load powerai           # python3 environment by default

Download the code mnist-with-summaries.py to $HOME folder:

cd ~
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/mnist-with-summaries.py

Train on MNIST with TensorFlow summary and go back to login node:

python ./mnist-with-summaries.py
exit

Batch mode

The same can be accomplished in batch mode using the following tfbd_sample.sb script:

wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/tfbd_sample.sb
sbatch tfbd_sample.sb
squeue

Start the TensorBorad session

After job completed the TensorFlow log files can be found in "~/tensorflow/mnist/logs", start the TensorBoard server on login node:

module load powerai
tensorboard --logdir ~/tensorflow/mnist/logs/ --port [user_pick_port] # please use random number within [6500-6999]

Forward the [user_pick_port] on remote machine to the port 16006 on local machine:

ssh -N -f -L localhost:16006:localhost:[user_pick_port] your_user_name@hal.ncsa.illinois.edu

Paste the follow address into web browser to start the TensorBoard session:

localhost:16006

Simple Example with Pytorch

Interactive mode

Get node for interactive use:

srun --partition=debug --pty --nodes=1 --ntasks-per-node=8 --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bash

Once on the compute node, load PowerAI module using one of these:

module load powerai/1.6.0-py2.7 # for python2 environment
module load powerai/1.6.0-py3.6 # for python3 environment
module load powerai           # python3 environment by default

Install samples for Pytorch:

pytorch-install-samples ~/pytorch-samples
cd ~/pytorch-samples

Train on MNIST with Pytorch:

python ./examples/mnist/main.py

Batch mode

The same can be accomplished in batch mode using the following pytorch_sample.sb script:

wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/pytorch_sample.sb
sbatch pytorch_sample.sb
squeue
  • No labels