PowerAI is an enterprise software distribution that combines popular open source deep learning frameworks, efficient AI development tools, and accelerated IBM Power Systems servers. It includes the following frameworks:
Framework | Version | Description |
---|---|---|
Caffe | 1.0 | Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research and by community contributors. |
Caffe2 | n/a | Caffe2 is a companion to PyTorch. PyTorch is great for experimentation and rapid development, while Caffe2 is aimed at production environments. |
Pytorch | 1.0.1 | Pytorch is an open source deep learning platform that provides a seamless path from research prototyping to production deployment. It is developed by Facebook and by community contributors. |
TensorFlow | 1.13.1 | TensorFlow is an end-to-end open source platform for machine learning. It is developed by Google and by community contributors. |
For complete PowerAI documentation, see https://www.ibm.com/support/knowledgecenter/SS5SF7_1.6.0/navigation/pai_getstarted.htm. Here we only show simple examples with system-specific instructions.
Get node for interactive use:
srun --partition=debug --pty --nodes=1 --ntasks-per-node=8 --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bash |
Once on the compute node, load PowerAI module using one of these:
module load ibm/powerai/1.6.0.py2 # for python2 environment module load ibm/powerai/1.6.0.py3 # for python3 environment module load ibm/powerai # python3 environment by default |
Install samples for Caffe:
caffe-install-samples ~/caffe-samples cd ~/caffe-samples |
Download data for MNIST model:
./data/mnist/get_mnist.sh |
Convert data and create MNIST model:
./examples/mnist/create_mnist.sh |
Train LeNet on MNIST:
./examples/mnist/train_lenet.sh |
The same can be accomplished in batch mode using the following caffe_sample.sb script:
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/caffe_sample.sb sbatch caffe_sample.sb squeue |
Get node for interactive use:
srun --partition=debug --pty --nodes=1 --ntasks-per-node=8 --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bash |
Once on the compute node, load PowerAI module using one of these:
module load ibm/powerai/1.6.0.py2 # for python2 environment module load ibm/powerai/1.6.0.py3 # for python3 environment module load ibm/powerai # python3 environment by default |
Install samples for Caffe2:
caffe2-install-samples ~/caffe2-samples cd ~/caffe2-samples |
Download data with LMDB:
python ./examples/lmdb_create_example.py --output_file lmdb |
Train ResNet50 with Caffe2:
python ./examples/resnet50_trainer.py --train_data ./lmdb |
The same can be accomplished in batch mode using the following caffe2_sample.sb script:
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/caffe2_sample.sb sbatch caffe2_sample.sb squeue |
Get node for interactive use:
srun --partition=debug --pty --nodes=1 --ntasks-per-node=8 --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bash |
Once on the compute node, load PowerAI module using one of these:
module load ibm/powerai/1.6.0.py2 # for python2 environment module load ibm/powerai/1.6.0.py3 # for python3 environment module load ibm/powerai # python3 environment by default |
Copy the following code into file "mnist-demo.py":
import tensorflow as tf mnist = tf.keras.datasets.mnist (x_train, y_train),(x_test, y_test) = mnist.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(512, activation=tf.nn.relu), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation=tf.nn.softmax) ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) model.fit(x_train, y_train, epochs=5) model.evaluate(x_test, y_test) |
Train on MNIST with keras API:
python ./mnist-demo.py |
The same can be accomplished in batch mode using the following tf_sample.sb script:
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/tf_sample.sb sbatch tf_sample.sb squeue |
Get node for interactive use:
srun --partition=debug --pty --nodes=1 --ntasks-per-node=8 --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bash |
Once on the compute node, load PowerAI module using one of these:
module load ibm/powerai/1.6.0.py2 # for python2 environment module load ibm/powerai/1.6.0.py3 # for python3 environment module load ibm/powerai # python3 environment by default |
Download the code mnist-with-summaries.py to $HOME folder:
cd ~ wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/mnist-with-summaries.py |
Train on MNIST with TensorFlow summary and go back to login node:
python ./mnist-with-summaries.py exit |
The same can be accomplished in batch mode using the following tfbd_sample.sb script:
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/tfbd_sample.sb sbatch tfbd_sample.sb squeue |
After job completed the TensorFlow log files can be found in "~/tensorflow/mnist/logs", start the TensorBoard server on login node:
tensorboard --logdir ~/tensorflow/mnist/logs/ |
Forward the port 6006 on remote machine to the port 16006 on local machine:
ssh -N -f -L localhost:16006:localhost:6006 dmu@hal |
Paste the follow address into web browser to start the TensorBoard session:
localhost:16006 |
Get node for interactive use:
srun --partition=debug --pty --nodes=1 --ntasks-per-node=8 --gres=gpu:v100:1 -t 01:30:00 --wait=0 --export=ALL /bin/bash |
Once on the compute node, load PowerAI module using one of these:
module load ibm/powerai/1.6.0.py2 # for python2 environment module load ibm/powerai/1.6.0.py3 # for python3 environment module load ibm/powerai # python3 environment by default |
Install samples for Pytorch:
pytorch-install-samples ~/pytorch-samples cd ~/pytorch-samples |
Train on MNIST with Pytorch:
python ./examples/mnist/main.py |
The same can be accomplished in batch mode using the following pytorch_sample.sb script:
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/pytorch_sample.sb sbatch pytorch_sample.sb squeue |