Info | ||
---|---|---|
| ||
See Getting started with Open Cognitive Environment (OpenCE, former WMLCE) for the latest software stack. |
Table of Contents |
---|
IBM Watson Machine Learning Community Edition (WMLCE-1.7.0, WMLCE-1.6.
...
2)
PowerAI WMLCE is an enterprise software distribution that combines popular open-source deep learning frameworks, efficient AI development tools, and accelerated IBM Power Systems servers. It includes the following frameworks:
Framework | Version | Description |
---|---|---|
Caffe | 1.0 | Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research and by community contributors. |
Caffe2 | n/a | Caffe2 is a companion to PyTorch. PyTorch is great for experimentation and rapid development, while Caffe2 is aimed at production environments. |
Pytorch | 1.3.1.0 | Pytorch is an open-source deep learning platform that provides a seamless path from research prototyping to production deployment. It is developed by Facebook and by community contributors. |
TensorFlow | 2.1.14.0 | TensorFlow is an end-to-end open-source platform for machine learning. It is developed by Google and by community contributors. |
For complete PowerAI WMLCE documentation, see https://wwwdeveloper.ibm.com/support/knowledgecenter/SS5SF7_1.6.0/navigation/pai_getstarted.htmlinuxonpower/deep-learning-powerai/releases/. Here we only show simple examples with system-specific instructions.
...
Code Block |
---|
module load wmlce/1.6.1-py2.7 # for python2 environment2 module load wmlce/1.6.1-py3.6 # for python3 environment module load wmlce # python3 environment by default7.0 |
Install samples for Caffe:
...
The same can be accomplished in batch mode using the following caffe_sample.sb scriptswb script:
Code Block |
---|
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/caffe_sample.sbswb sbatchswbatch caffe_sample.sbswb squeue |
Simple Example with Caffe2
Interactive mode
Get a node for interactive use:
...
Code Block |
---|
module load wmlce/1.6.1-py2.7 # for python2 environment2 module load wmlce/1.6.1-py3.6 # for python3 environment module load wmlce # python3 environment by default7.0 |
Install samples for Caffe2:
...
The same can be accomplished in batch mode using the following caffe2_sample.sb scriptswb script:
Code Block |
---|
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/caffe2_sample.sbswb sbatch caffe2_sample.sbswb squeue |
Simple Example with TensorFlow
Interactive mode
Get a node for interactive use:
...
Code Block |
---|
module load wmlce/1.6.1-py2.7 # for python2 environment2 module load wmlce/1.6.1-py3.6 # for python3 environment module load wmlce # python3 environment by default7.0 |
Copy the following code into file "mnist-demo.py":
...
The same can be accomplished in batch mode using the following tf_sample.sb scriptswb script:
Code Block |
---|
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/tf_sample.sbswb sbatch tf_sample.sbswb squeue |
Visualization with TensorBoard
Interactive mode
Get a node for interactive use:
...
Code Block |
---|
module load wmlce/1.6.1-py2.7 # for python2 environment2 module load wmlce/1.6.1-py3.6 # for python3 environment module load wmlce # python3 environment by default7.0 |
Download the code mnist-with-summaries.py to $HOME folder:
...
Train on MNIST with TensorFlow summary and go back to login node:
Code Block |
---|
python ./mnist-with-summaries.py
exit |
Batch mode
The same can be accomplished in batch mode using the following tfbd_sample.sb scriptswb script:
Code Block |
---|
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/tfbd_sample.sbswb sbatch tfbd_sample.sbswb squeue |
Start the TensorBorad session
After job completed the TensorFlow log files can be found in "~/tensorflow/mnist/logs", start the TensorBoard server on login node:
Code Block |
---|
module load wmlce
tensorboard --logdir ~/tensorflow/mnist/logs/ --port [user_pick_port] # please use random number within [6500-6999] |
Forward the [user_pick_port] on remote machine to the port 16006 on local machine:
Code Block |
---|
ssh -N -f -L localhost:16006:localhost:[user_pick_port] your_user_name@hal.ncsa.illinois.edu |
Paste the follow address into web browser to start the TensorBoard session:
Code Block |
---|
localhost:16006 |
hal-ondemand, detail refers Getting started with HAL OnDemand.
Simple Example with Pytorch
Interactive mode
Get a node for interactive use:
...
Code Block |
---|
module load wmlce/1.6.1-py2.7 # for python2 environment2 module load wmlce/1.6.1-py3.6 # for python3 environment module load wmlce # python3 environment by default7.0 |
Install samples for Pytorch:
...
The same can be accomplished in batch mode using the following pytorch_sample.sbswb script:
Code Block |
---|
wget https://wiki.ncsa.illinois.edu/download/attachments/82510352/pytorch_sample.sbswb sbatch pytorch_sample.sbswb squeue |
Major Installed PowerAI Related Anaconda Modules
Name | Version | Description | |||||
---|---|---|---|---|---|---|---|
caffe | 1.0 | Caffe is a deep learning framework made with expression, speed, and modularity in mind. | |||||
cudatoolkit | 10.12.10589 | The NVIDIA® CUDA® Toolkit provides a development environment for creating high-performance GPU-accelerated applications. | |||||
cudnn | 7.6.5.0+10.12 | The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. | |||||
h5py | 2.8.0 | The h5py package is a Pythonic interface to the HDF5 binary data format. | |||||
jupyter | 1.0.0 | Project Jupyter exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages. | |||||
matplotlib | 2.2 . 3 | Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. | |||||
nccl | 2.45.26 | The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective communication primitives that are performance-optimized for NVIDIA GPUs.numpy | |||||
1.14.5 | NumPy is the fundamental package for scientific computing with Python. | opencv | 3.4.28 | OpenCV was designed for computational efficiency and with a strong focus on real-time applications. | pytables | 3.4.4 | PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data. |
pytorch | 1.03.1 | PyTorch enables fast, flexible experimentation and efficient production through a hybrid front-end, distributed training, and ecosystem of tools and libraries. | |||||
scikit-learn | 0.19.1 | Simple and efficient tools for data mining and data analysis. | |||||
scipy | 1.1.0 | SciPy is a Python-based ecosystem of open-source software for mathematics, science, and engineering | |||||
tensorboard | 2.1 | tensorboard | 1.13.0 | To make it easier to understand, debug, and optimize TensorFlow programs, we've included a suite of visualization tools called TensorBoard. | |||
tensorflow-gpu | 2.1.13.10 | The core open-source library to help you develop and train ML models. | torchvision | 0.2.1 | The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision. |