The C3 AI Suite has Types specifically designed to facilitate certain machine learning pipelines. Taking their inspiration from the Scikit learn machine learning pipeline. At the most general, we have the `MLPipe` Type defining the most basic behavior of the pipeline with the `
process`, and `
score` methods, and various specializations which define a whole pipeline of operations. With this structure, you can call '
train' on the top level pipeline Type and the whole pipeline is trained. Same with '
process' and '
score' to both process inputs and score results. We'll start by discussing the basic Types which are available, beginning with the abstract Types forming the basis of this Machine Learning system, then some concrete implementations of those Types and finally some examples. We'll then discuss how a new machine learning model may be ported onto the C3 AI platform.
MLPipe: An abstract Type which defines general behaviors like train, process, and score. This Type is mixed in by nearly all Types in the C3 AI Machine Learning space.
CustomPythonMLPipe: A helper Type to act as a 'base' for defining new python based machine learning Pipes.
MLSerialPipeline: This is the concrete implementation of the MLPipeline Type. Since MLSerialPipeline is so general, you won't have to subclass it.
XgBoostPipe: This Type implements the sklearn-compatible part of the Xgboost library. See the
Let's take a look at the C3 AI developed TutorailIntroMLPipeline.ipynb notebook.
The first step in any machine learning task is to prepare the data. In this case, we're going to use the popular iris dataset. We first use some sklearn functions to load the data, and split it into a training set and testing set.
XTrain, XTest, yTrain, yTest = [c3.Dataset.fromPython(pythonData=ds_np) for ds_np in datasets_np]
Defining the C3 MLPipeline
C3 AI Machine Learning Pipelines can be thought of as a series of steps. These steps can be nested, so we can define a 'preprocessing' step (perhaps containing multiple steps itself) which can normalize and transform data into a better form for ML algorithms, and a regression step which runs the ML model on the transformed data. So, we'll build the MLPipeline step by step.
Let's build a preprocessing pipeline which will first scale the data within the interval [0,1], then we'll do a principal component analysis and extract the first two principal components. These components will be easy for a machine learning algorithm to use.
Now we have `
preprocessPipeline` containing our preprocessing pipeline! We can use this pipeline as part of a larger pipeline now.
Regression Leaf Pipe
We now need to define our logistic regression model. We'll create an SklearnPipe (a concrete MLLeafPipe) which contains this example:
This SklearnPipe now contains the sklearn model.
Compose full Pipeline
Now, we build the final pipeline:
Training the Pipeline
Once our MLPipeline has been defined we can use the training data from before to train it:
score = trainedLr.score(input=XTest, targetOutput=yTest)
Storing and Retrieving the Trained Pipeline
Once a model is trained, you can store it as a persisted MLSerialPipeline Type. You can then retrieve this model later in a different script or different component of the C3 AI Suite. Let's look at storing:
Now you can use `
process` on new data with the fetched Pipeline!
Quick Example Using KerasPipe
Using snippets from the DTI created mnistExample, we'll show quickly how you can use the KerasPipe MLLeafPipe.
result = trained_pipe.process(input=test_X)
Several C3 AI developed Jupyter notebooks exist which demonstrate the usage of these Pipeline Types: