Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Info
titleWMLCE has reached End-Of-Life and is now out of date.

See Getting started with Open Cognitive Environment (OpenCE, former WMLCE) for the latest software stack.


Table of Contents

IBM Watson Machine Learning Community Edition (WMLCE-1.7.0, WMLCE-1.6.2)

WMLCE PowerAI is an enterprise software distribution that combines popular open-source deep learning frameworks, efficient AI development tools, and accelerated IBM Power Systems servers. It includes the following frameworks:

FrameworkVersionDescription
Caffe1.0Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research and by community contributors.
Caffe2n/aCaffe2 is a companion to PyTorch. PyTorch is great for experimentation and rapid development, while Caffe2 is aimed at production environments.
Pytorch1.03.1Pytorch is an open-source deep learning platform that provides a seamless path from research prototyping to production deployment. It is developed by Facebook and by community contributors.
TensorFlow2.1.13.10TensorFlow is an end-to-end open-source platform for machine learning. It is developed by Google and by community contributors.

For complete PowerAI WMLCE documentation, see https://wwwdeveloper.ibm.com/support/knowledgecenter/SS5SF7_1.6.0/navigation/pai_getstarted.htmlinuxonpower/deep-learning-powerai/releases/. Here we only show simple examples with system-specific instructions.

Simple Example

...

with Caffe

Interactive mode

Get one compute node for interactive use:

Code Block
srunswrun --partition=debug --pty --nodes=1 --ntasks-per-node=8 --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bashp gpux1

Once on the compute node, load PowerAI module using one of these:

Code Block
module load ibmwmlce/powerai/1.6.0.py2 # for python2 environment2
module load ibmwmlce/powerai/1.67.0.py3 # for python3 environment
module load ibm/powerai           # python3 environment by default

Install samples for Caffe:

...

Code Block
./examples/mnist/train_lenet.sh

Batch mode

The same can be accomplished in batch mode using the following caffe_sample.sb scriptswb script:

Code Block
sbatchwget https://wiki.ncsa.illinois.edu/download/attachments/82510352/caffe_sample.swb
swbatch caffe_sample.sbswb
squeue

Simple Example

...

with Caffe2

Interactive mode

Get a node for interactive use:

Code Block
srunswrun --partition=debug --pty --nodes=1 --ntasks-per-node=8 --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bashp gpux1

Once on the compute node, load PowerAI module using one of these:

Code Block
module load ibmwmlce/powerai/1.6.0.py2 # for python2 environment2
module load ibmwmlce/powerai/1.6.0.py3 # for python3 environment
module load ibm/powerai           # python3 environment by default7.0

Install samples for Caffe2:

Code Block
caffe2-install-samples ~/caffe2-samples
cd ~/caffe2-samplesamples

Download data with LMDB:

Code Block
python ./examples/lmdb_create_example.py --output_file lmdb

...

Code Block
python ./examples/resnet50_trainer.py --train_data ./lmdb

Batch mode

The same can be accomplished in batch mode using the following caffe2_sample.sb scriptswb script:

Code Block
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/caffe2_sample.swb
sbatch caffe2_sample.sbswb
squeue

Simple Example

...

with TensorFlow

Interactive mode

Get a node for interactive use:

Code Block
srunswrun --partition=debug --pty --nodes=1 --ntasks-per-node=8 --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bashp gpux1

Once on the compute node, load PowerAI module using one of these:

Code Block
module load ibmwmlce/powerai/1.6.0.py2 # for python2 environment2
module load ibm/poweraiwmlce/1.6.0.py3 # for python3 environment
module load ibm/powerai           # python3 environment by default7.0

Copy the following code into file "mnist-demo.py":

...

Code Block
python ./mnist-demo.py

Batch mode

The same can be accomplished in batch mode using the following tf_sample.sb scriptswb script:

Code Block
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/tf_sample.swb
sbatch tf_sample.sbswb
squeue

...

Visualization with TensorBoard

Interactive mode

Get a node for interactive use:

Code Block
swrun -p gpux1

Once on the compute node, load PowerAI module using one of these:

Code Block
module load wmlce/1.6.2
module load wmlce/1.7.0

Download the code mnist-with-summaries.py to $HOME folder:

Code Block
cd ~
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/mnist-with-summaries.py

Train on MNIST with TensorFlow summary:

Code Block
python ./mnist-with-summaries.py

Batch mode

The same can be accomplished in batch mode using the following tfbd_sample.swb script:

Code Block
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/tfbd_sample.swb
sbatch tfbd_sample.swb
squeue

Start the TensorBorad session

After job completed the TensorFlow log files can be found in "~/tensorflow/mnist/logs", start the TensorBoard server on hal-ondemand, detail refers Getting started with HAL OnDemand.

Simple Example with Pytorch

Interactive mode

Get a node for interactive use:

Code Block
swrun -p gpux1srun --partition=debug --pty --nodes=1 --ntasks-per-node=8 --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bash

Once on the compute node, load PowerAI module using one of these:

Code Block
module load ibmwmlce/powerai/1.6.0.py2 # for python2 environment2
module load ibm/poweraiwmlce/1.6.0.py3 # for python3 environment
module load ibm/powerai           # python3 environment by default7.0

Install samples for Pytorch:

...

Code Block
python ./examples/mnist/main.py

Batch mode

The same can be accomplished in batch mode using the following pytorch_sample.sbswb script:

Code Block
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/pytorch_sample.swb
sbatch pytorch_sample.sb
squeueswb
squeue

Major Installed PowerAI Related Anaconda Modules

NameVersionDescription
caffe1.0Caffe is a deep learning framework made with expression, speed, and modularity in mind.
cudatoolkit10.2.89

The NVIDIA® CUDA® Toolkit provides a development environment for creating high-performance GPU-accelerated applications.

cudnn7.6.5+10.2

The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks.

nccl2.5.6The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective communication primitives that are performance-optimized for NVIDIA GPUs.
opencv3.4.8OpenCV was designed for computational efficiency and with a strong focus on real-time applications.
pytorch1.3.1PyTorch enables fast, flexible experimentation and efficient production through a hybrid front-end, distributed training, and ecosystem of tools and libraries.
tensorboard2.1.0To make it easier to understand, debug, and optimize TensorFlow programs, we've included a suite of visualization tools called TensorBoard.
tensorflow-gpu2.1.0The core open-source library to help you develop and train ML models.