Table of Contents |
---|
IBM PowerAI 1.6.0
PowerAI is an enterprise software distribution that combines popular open source deep learning frameworks, efficient AI development tools, and accelerated IBM Power Systems servers. It includes the following frameworks:
Framework | Version | Description |
---|---|---|
Caffe | 1.0 | Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research and by community contributors. |
Caffe2 | n/a | Caffe2 is a companion to PyTorch. PyTorch is great for experimentation and rapid development, while Caffe2 is aimed at production environments. |
Pytorch | 1.0.1 | Pytorch is an open source deep learning platform that provides a seamless path from research prototyping to production deployment. It is developed by Facebook and by community contributors. |
TensorFlow | 1.13.1 | TensorFlow is an end-to-end open source platform for machine learning. It is developed by Google and by community contributors. |
For complete PowerAI documentation, see https://www.ibm.com/support/knowledgecenter/SS5SF7_1.6.0/navigation/pai_getstarted.htm. Here we only show simple examples with system-specific instructions.
Simple Example for Caffe
Interactive mode
Get node for interactive use:
Code Block |
---|
srun --partition=debug --pty --nodes=1 --ntasks-per-node=8 --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bash |
Once on the compute node, load PowerAI module using one of these:
Code Block |
---|
module load ibm/powerai/1.6.0.py2 # for python2 environment module load ibm/powerai/1.6.0.py3 # for python3 environment module load ibm/powerai # python3 environment by default |
Install samples for Caffe:
Code Block |
---|
caffe-install-samples ~/caffe-samples cd ~/caffe-samples |
Download data for MNIST model:
Code Block |
---|
./data/mnist/get_mnist.sh |
Convert data and create MNIST model:
Code Block |
---|
./examples/mnist/create_mnist.sh |
Train LeNet on MNIST:
Code Block |
---|
./examples/mnist/train_lenet.sh |
Batch mode
The same can be accomplished in batch mode using the following caffe_sample.sb script:
Code Block |
---|
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/caffe_sample.sb sbatch caffe_sample.sb squeue |
Simple Example for Caffe2
Interactive mode
Get node for interactive use:
Code Block |
---|
srun --partition=debug --pty --nodes=1 --ntasks-per-node=8 --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bash |
Once on the compute node, load PowerAI module using one of these:
Code Block |
---|
module load ibm/powerai/1.6.0.py2 # for python2 environment module load ibm/powerai/1.6.0.py3 # for python3 environment module load ibm/powerai # python3 environment by default |
Install samples for Caffe2:
Code Block |
---|
caffe2-install-samples ~/caffe2-samples cd ~/caffe2-samples |
Download data with LMDB:
Code Block |
---|
python ./examples/lmdb_create_example.py --output_file lmdb |
Train ResNet50 with Caffe2:
Code Block |
---|
python ./examples/resnet50_trainer.py --train_data ./lmdb |
Batch mode
The same can be accomplished in batch mode using the following caffe2_sample.sb script:
Code Block |
---|
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/caffe2_sample.sb sbatch caffe2_sample.sb squeue |
Simple Example for TensorFlow
Interactive mode
Get node for interactive use:
Code Block |
---|
srun --partition=debug --pty --nodes=1 --ntasks-per-node=8 --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bash |
Once on the compute node, load PowerAI module using one of these:
Code Block |
---|
module load ibm/powerai/1.6.0.py2 # for python2 environment module load ibm/powerai/1.6.0.py3 # for python3 environment module load ibm/powerai # python3 environment by default |
Copy the following code into file "mnist-demo.py":
Code Block |
---|
import tensorflow as tf mnist = tf.keras.datasets.mnist (x_train, y_train),(x_test, y_test) = mnist.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(512, activation=tf.nn.relu), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation=tf.nn.softmax) ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) model.fit(x_train, y_train, epochs=5) model.evaluate(x_test, y_test) |
Train on MNIST with keras API:
Code Block |
---|
python ./mnist-demo.py |
Batch mode
The same can be accomplished in batch mode using the following tf_sample.sb script:
Code Block |
---|
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/tf_sample.sb sbatch tf_sample.sb squeue |
Visualization with TensorBoard
Interactive mode
Get node for interactive use:
Code Block |
---|
srun --partition=debug --pty --nodes=1 --ntasks-per-node=8 --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bash |
Once on the compute node, load PowerAI module using one of these:
Code Block |
---|
module load ibm/powerai/1.6.0.py2 # for python2 environment module load ibm/powerai/1.6.0.py3 # for python3 environment module load ibm/powerai # python3 environment by default |
Download the code mnist-with-summaries.py to $HOME folder:
Code Block |
---|
cd ~ wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/mnist-with-summaries.py |
Train on MNIST with TensorFlow summary and go back to login node:
Code Block |
---|
python ./mnist-with-summaries.py exit |
Batch mode
The same can be accomplished in batch mode using the following tfbd_sample.sb script:
Code Block |
---|
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/tfbd_sample.sb sbatch tfbd_sample.sb squeue |
Start the TensorBorad session
After job completed the TensorFlow log files can be found in "~/tensorflow/mnist/logs", start the TensorBoard server on login node:
Code Block |
---|
tensorboard --logdir ~/tensorflow/mnist/logs/ --port [user_pick_port] # please use random number within [6500-6999] |
Forward the [user_pick_port] on remote machine to the port 16006 on local machine:
Code Block |
---|
ssh -N -f -L localhost:16006:localhost:[user_pick_port] dmu@hal |
Paste the follow address into web browser to start the TensorBoard session:
Code Block |
---|
localhost:16006 |
Simple Example for Pytorch
Interactive mode
Get node for interactive use:
Code Block |
---|
srun --partition=debug --pty --nodes=1 --ntasks-per-node=8 --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bash |
Once on the compute node, load PowerAI module using one of these:
Code Block |
---|
module load ibm/powerai/1.6.0.py2 # for python2 environment module load ibm/powerai/1.6.0.py3 # for python3 environment module load ibm/powerai # python3 environment by default |
Install samples for Pytorch:
Code Block |
---|
pytorch-install-samples ~/pytorch-samples cd ~/pytorch-samples |
Train on MNIST with Pytorch:
Code Block |
---|
python ./examples/mnist/main.py |
Batch mode
The same can be accomplished in batch mode using the following pytorch_sample.sb script:
Code Block |
---|
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/pytorch_sample.sb sbatch pytorch_sample.sb squeue |