The C3 AI Suite provides researchers many tools to analyze data and build and deploy machine learning models. This guide explains how to connect to the C3 AI Suite, access data using C3 AI methods, and convert C3 AI method outputs to an easy-to-analyze form. Additionally, the guide also provides more detailed instructions to DTI members using the Covid-19 Data Lake. Examples in this guide rely on the 'baseCovidDataLake
' package available in this git repository.
Please note, this guide covers how to query data from the C3 AI Suite. For more advanced topics such as loading data, building metrics, or configuring and training machine learning models, please refer to the following wikis:
To best understand the C3 AI Suite and this guide, let's introduce key terminology used by C3 AI Suite developers:
https://<vanity_url>/static/console
' (replacing <vanity_url>
with your Vanity Url).The C3 AI Suite is a Platform as a Service (PaaS), which enables organizations to build, deploy, and operate enterprise-scale Big Data, AI, and IoT applications. The C3 AI Suite can be deployed on any private or public cloud infrastructure such as AWS, Azure, and Google Cloud Platform. When developing and operating applications, a C3 cluster is responsible for managing and supporting all the features of the C3 AI Suite. A C3 Cluster has at least one Master node and many Worker nodes. Master nodes prioritize and distribute jobs to Worker nodes and handle user requests. Worker nodes carry out jobs, allocated by the Master node. Other components of a C3 Cluster include databases (e.g., Postgres, Cassandra, Azure Blob), logging services (i.e., Splunk), and Jupyter. Atop these hardware or virtualized cloud resources is a logical software structure, with the highest level being a Cluster. A C3 Cluster is broken out into numerous tenants. Tenants are logically separated from each other (i.e., a particular tenant's data and packages are not accessible or visible to any other tenants), and contain many tags. Tags host C3 AI Packages (i.e., the code that C3 AI developers write and provision to the C3 AI Suite). A typical multi-tag, multi-tenant C3 Cluster is shown in a logical diagram below:
To learn more about the architecture of a C3 cluster, please see the training materials here:
To provision a package to your tag, follow the instructions available at the DTI Guide: Provisioning.
To run the examples in this guide you will need to provision the 'baseCovidDataLake
' by following the directions in the 'COVID-19 Data Lake Provisioning' section.
To learn more about provisioning, please see the C3 AI Develop Documentation here:
The static console is the main tool that developers use to interact with the C3 AI Suite. However, we anticipate that most DTI members will use Python (via Jupyter notebook) for data analysis. That being said, the static console is an essential part of working with the C3 AI Suite and you will use it frequently. For example, the static console is the best place to find documentation tailored directly to your package. It's also a great place to quickly test queries as no specialized environments need to be set up to use it. Static console is ready-to-go in all modern browsers, including Google Chrome, Mozilla Firefox, and Apple Safari.
Once you have provisioned a package to your tag, navigate to the static console page at this url: 'https://<vanity_url>/static/console
' (replacing <vanity_url>
with your Vanity Url provided in your C3.ai DTI Training Cluster Onboarding Email). The static console page looks like this:
The 'Tools' drop-down menu in the upper left-hand corner contains a list of available developer tools. The most relevant tool is the Provisioner, though there are also utilities for loading JavaScript files, debugging JS code, and inspecting Errors.
The 'Help' drop-down menu in the upper left-hand corner allows users to access console documentation and a C3 Cluster hosted documentation portal.
Most tools are also accessible through a series of icons in the upper right-hand corner:
Developers interact with the static console through the JavaScript console tab in the browser. When the static console page loads (or when you run the c3ImportAll()
command), JavaScript methods associated with all of your Package's defined Types are populated. You can write and run JavaScript code directly in the console tab to interact with your package.
You can also open Javascript console with the 'Ctrl+Shift+I' keyboard shortcut (in most browsers). Javascript console is also available through the browser's developer tools. If the 'Ctrl+Shift+I' keyboard shortcut doesn't work for you, review your browser's documentation on developer tools. Here's how the static console looks in Firefox, with the JavaScript console open:
Finally, let's write some JavaScript commands to see the console in action!
The DTI Team have recorded a short video introducing and describing the static console functionality:
Here are common JavaScript console commands used on the static console page.
c3ImportAll()
after provisioning a new package.EvalMetricsResult
)c3ShowType(OutbreakLocation)
)We anticipate most DTI researchers will want to use Python for data analysis. There are two options to connect to a C3 Cluster via Python. Please follow the links below for detailed information about each.
To learn more about the general structure of a C3 cluster, please see the resources here:
All data in the C3 AI Suite are stored in C3 Types. Users can access data from a Type with the 'fetch
' method. Behind the scenes, the 'fetch
' method submits a query directly to the database underlying a Type, and retrieves and presents the query results.
The C3 AI Suite returns the 'fetch
' query's response, which includes:
fetch
' query (e.g., the number of objects, whether additional data exists in the database) into the FetchResult
type for data analysis (see example below).To learn more about the 'fetch
' method, please see the C3 AI resources here:
Users can also provide a FetchSpec (or parameters) to the 'fetch
' method to describe particular data to retrieve (e.g., only retrieve gene sequences collected in Germany). The FetchSpec can be 'empty
' (e.g., OutbreakLocation.fetch()
), or contain several parameters to return a subset of the data.
Some example FetchSpec parameters include:
method
' without returning too many records.include
spec is defined, all fields from the Type will be returned.Note: Please see the official FetchSpec documentation for a full list of parameters: https://developer.c3.ai/docs/7.12.17/type/FetchSpec
The OutbreakLocation
Type contains information from various locations for which the Covid-19 Data Lake has virus-related information. We can fetch OutbreakLocation
records for which the 'latestTotalPopulation
' field exists (i.e., is not null). We can also retrieve these records in descending order by their 'countryArea
':
res = OutbreakLocation.fetch({ 'limit': -1, 'filter': 'exists(latestTotalPopulation)', 'order': 'descending(latestTotalPopulation)', 'include': 'id, name, latestTotalPopulation, populationOfAllChildren, countryArea, countryCode' }) |
And we can show these results in the C3 AI static console using the c3Grid
command:
You can run this same fetch in Python:
raw_data = c3.OutbreakLocation.fetch({ 'limit': -1, 'filter': 'exists(latestTotalPopulation)', 'order': 'descending(latestTotalPopulation)', 'include': 'id, name, latestTotalPopulation, populationOfAllChildren, countryArea, countryCode' }) |
Additional details on "Fetching in Python" are available in this C3 AI Developer documentation: https://developer.c3.ai/docs/7.12.25/topic/ds-jupyter-notebooks
Additional examples of fetch calls can be found here:
This tutorial video goes over fetching and filtering:
Another useful command is 'fetchCount
'. Like 'fetch
', users can also provide a FetchSpec (or parameters) to 'fetchCount
'. The 'fetchCount
' method then returns the number of records that match the FetchSpec. This is useful when trying to determine whether a given search is refined enough.
OutbreakLocation.fetchCount({'filter': 'exists(latestTotalPopulation)'}) |
You can run the same 'fetchCount
' in python:
c3.OutbreakLocation.fetchCount(spec={'filter': 'exists(latestTotalPopulation)'}) |
To learn more about the 'fetchCount
' method, please see the fetchCount
method definition in the Persistable Type documentation: https://developer.c3.ai/docs/7.12.25/type/Persistable
When using a Jupyter Notebook, C3 AI developers typically modify FetchResults for data analysis. This section shows a couple of ways to convert FetchResults into easy-to-analyze forms.
In python, first retrieve the 'objs
' field from the FetchResults object, and then call the toJson()
function. The toJson()
function returns an array of dictionaries each with keys equal to the requested fields of the fetched C3 Type. Using the Pandas library, this array can be turned into an analysis-ready DataFrame, as the below example shows:
import pandas as pd df = pd.DataFrame(raw_data.objs.toJson()) df.head() df.drop('meta', axis=1, inplace=True) df.drop('type', axis=1, inplace=True) df.drop('version', axis=1, inplace=True) df.drop('id', axis=1, inplace=True) df.head() |
Users can then manipulate the resulting DataFrame, using common programming libraries and frameworks.
The C3 AI Suite also provides a pre-built library of "ExpressionEngineFunctions". Expression Engine Functions take a variety of arguments and perform various data processing tasks. For example, the function 'contains
' takes two strings as arguments and checks whether the first argument contains the second argument. The function 'lowerCase
' takes as input a string and returns that same string with all lowercase letters. In addition to these string processing functions, the C3 AI Suite's ExpressionEngine includes many math functions (such as 'log
', 'avg
', and 'abs
') which operate on various input data types (e.g. int
, double
, float
).
The ExpressionEngine Functions are used in several places, such as:
fetch
' filterstsDecl
metric valuesTo learn more about ExpressionEngineFunctions, please see the C3 AI resources here:
Using the 'evaluate
' method, developers can run aggregations or other computations on data fetched from a C3 Type. (e.g., compute the average area across all countries with area data available, in the OutbreakLocation
Type).
The 'evaluate
' method takes several parameters:
avg
, unique
, min
, max
). You can simply think about a projection as the columns/fields, calculated or otherwise, which the "evaluate
" method should return.locationType
' field in OutbreakLocation
). Please note, in any 'evaluate
' command, all columns in the 'group' field MUST ALSO BE in the 'projection
' field, as the example below shows. evaluate
' command, all columns in the 'order' field MUST ALSO BE in the 'projection
' field.evaluate
method is run. In static console, 'c3Grid
' displays the 'evaluate
' method results nicely:
(Note: the 'locationType' expression within the 'group' field is also within the 'projection' field. This is required.)
var eval_result = OutbreakLocation.evaluate({ 'projection': 'avg(countryArea), locationType', 'group': 'locationType', 'filter': 'exists(countryArea) && exists(locationType)' }) c3Grid(eval_result) |
Users can also run the 'evaluate
' method in python. In this case, users often modify the 'evaluate
' method's results for data analysis. To view and analyze the 'evaluate
' method's results in Python, please use the helper function available in C3 DTI's c3python module here: https://github.com/c3aidti/c3python
NOTE: The 'locationType' expression within the 'group' field is also within the 'projection' field. This is required.
eval_spec = { 'projection': 'avg(countryArea), locationType', 'group': 'locationType', 'filter': 'exists(countryArea) && exists(locationType)' } eval_res = c3.OutbreakLocation.evaluate(eval_spec) df = c3python.EvaluateResultToPandas(result=eval_res, eval_spec=eval_spec) |
Here's another example of running the 'evaluate
' method in Python, this time using the 'order
' parameter as well:
NOTE: The 'count(ethnicity)' expression within the 'order' field is also within the 'projection' field. This is required.
spec = c3.EvaluateSpec( projection="ethnicity, count(ethnicity)", order='descending(count(ethnicity))', group="ethnicity" ) c3python.EvaluateResultToPandas(result=c3.SurveyData.evaluate(spec), eval_spec=spec) |
To learn more about the 'evaluate
' method, please see the C3 AI resources here:
The C3 AI Suite also offers several features to handle time series data. To interact with time series, C3 AI developers typically use simple and compound metrics. These metrics are used in several places in the C3 AI Suite such as:
To supplement the documentation below, we also have recorded a video lecture about Time Series data on the C3 AI Platform.
Simple metrics allow developers to produce time-series from raw data and are often used to construct more advanced metrics (i.e., Compound Metrics), in practice. Simple metrics are linked to a specific C3 Type and reference the timeseries data stored within that Type. To declare a simple metric, users should specify the following fields:
OutbreakLocation
).srcType
to the C3 Type that stores the raw data referenced by the simple metric (e.g., pointMeasurements
) srcType
itself stores the raw data referenced by the simple metric, path field is optional.Here is an example of a Simple Metric:
met = c3.SimpleMetric( id='JHU_ConfirmedCases2_OutbreakLocation', name='JHU_ConfirmedCases2', srcType='OutbreakLocation', path="aggregateMeasurements.(measurementType == 'confirmed' && origin == " "'Johns Hopkins University')", expression='interpolate(avg(avg(normalized.data.value)), "PREVIOUS", "MISSING")' ) |
To learn more about Simple Metrics, please see the C3 AI resources here:
Another type of SimpleMetric is a tsDecl
(Timeseries Declaration) metric. tsDecl
metrics are often used to turn non-time series raw data (e.g., event data, status data, or data with irregular intervals) into time series. tsDecl
metrics have the same fields as standard SimpleMetric
, except for the 'tsDecl
' field, which replaces the 'expression
' field. tsDecl
metrics may allow users the added flexibility to define new metrics that the expression field may not support. Using a tsDecl
metric, the above metric can be re-written as:
met = c3.SimpleMetric( id='JHU_ConfirmedCases3_OutbreakLocation', name='JHU_ConfirmedCases3', srcType='OutbreakLocation', path="aggregateMeasurements.(measurementType == 'confirmed' && origin == " "'Johns Hopkins University')", tsDecl={ 'data': 'data', 'treatment': 'AVG', 'start': 'start', 'value': 'value' } ) |
To learn more about tsDecl
metrics, please see the C3 AI resources here:
Compound metrics allow C3 AI developers to manipulate or combine existing metrics into more complex time series. Compound metrics are built on top of one or many existing Simple or Compound metrics. Please note, to evaluate a Compound metric on a C3 Type, all Simple metrics used in that Compound metric must be defined on that Type as well. If not, an error is returned.
To declare a compound metric, users should specify the following fields:
An example CompoundMetric is:
met = c3.CompoundMetric( id='JHU_CaseFatalityRate', name='JHU_CaseFatalityRate', expression='JHU_ConfirmedDeaths/JHU_ConfirmedCases', ) |
To learn more about Compound metrics, please see the C3 AI resources here:
Users can find, evaluate, and visualize metrics built in the C3 AI Suite via the JavaScript console or a hosted Jupyter notebook.
All metrics that users build and deploy in the C3 AI Suite are also stored in C3 Types. To view a list of all the simple and compound metrics applicable to a C3 Type, run the 'listMetrics
' method as shown below:
Javascript:
var metrics = OutbreakLocation.listMetrics() c3Grid(metrics) |
Python:
import pandas as pd pd.DataFrame(c3.OutbreakLocation.listMetrics().toJson()) |
DTI Members using the Covid-19 Data Lake: While listMetrics
does return a list, this is fairly bare bones if the 'description
' field of a given metric isn't filled in. The Covid-19 Data Lake API documentation provides an extensive list of production-ready metrics along with detailed descriptions and usage examples.
After finding a metric, the next step is to evaluate on data in a C3 Type.
Metrics are evaluated with either the 'evalMetrics
' or 'evalMetricsWithMetadata
' methods. Behind the scenes, 'evalMetrics
' and 'evalMetricsWithMetadata
', fetch and transform raw data from a C3 Type into easy-to-analyze timeseries data. 'evalMetrics
' is used to evaluate metrics provisioned (deployed) to a tenant/tag. 'evalMetricsWithMetadata
' allows users to evaluate metrics either provisioned to a tenant/tag, or defined on-the-fly in JavaScript console, or a hosted Jupyter notebook (typically for debugging).
To learn more about the differences between 'evalMetrics
' and 'evalMetricsWithMetadata
' see the documentation here: https://developer.c3.ai/docs/7.12.25/type/MetricEvaluatable
To evaluate a metric, users must provide the following parameters (called an EvalMetricSpec) to the 'evalMetrics
' or 'evalMetricsWithMetadata
' methods:
Here's an example of evaluating a metric in Python:
spec = c3.EvalMetricsSpec( ids=[ 'Illinois_UnitedStates', 'California_UnitedStates', 'UnitedStates' ], expressions=[ 'JHU_ConfirmedCases', 'JHU_ConfirmedDeaths' ], start='2020-01-01', end='2020-08-01', interval='DAY', ) results = c3.OutbreakLocation.evalMetrics(spec=spec) |
In Python, you can also specify the spec using a Dictionary without creating an EvalMetricsSpec Type:
results = c3.OutbreakLocation.evalMetrics(spec={ 'ids': [ 'Illinois_UnitedStates', 'California_UnitedStates', 'UnitedStates' ], 'expressions': [ 'JHU_ConfirmedCases', 'JHU_ConfirmedDeaths' ], 'start': '2020-01-01', 'end': '2020-08-01', 'interval': 'DAY', }) |
The C3 AI Suite returns the evaluated metric results (a timeseries) into the 'EvalMetricsResult
' type. With various helper functions, C3 AI developers may then convert this timeseries into a Pandas DataFrame (via "Dataset
" type) for further data analysis or model development in a Jupyter notebook, as shown below:
ds = c3.Dataset.fromEvalMetricsResult(result=results) df = c3.Dataset.toPandas(dataset=ds) |
Additionally, users can visualize evaluated metric results directly in the web-browser (i.e., JavaScript console) with the 'c3Viz
' function.
Here's an example of evaluating and visualizing in JavaScript console:
var spec = EvalMetricsSpec.make({ 'ids': ['Illinois_UnitedStates', 'California_UnitedStates', 'UnitedStates' ], 'expressions': [ 'JHU_ConfirmedCases', 'JHU_ConfirmedDeaths' ], 'start': '2020-01-01', 'end': '2020-08-01', 'interval': 'DAY' }) var results = OutbreakLocation.evalMetrics(spec) c3Viz(results) |
Similarly, we don't have to explicitly create an EvalMetricsSpec
type:
var results = OutbreakLocation.evalMetrics({ 'ids': ['Illinois_UnitedStates', 'California_UnitedStates', 'UnitedStates' ], 'expressions': [ 'JHU_ConfirmedCases', 'JHU_ConfirmedDeaths' ], 'start': '2020-01-01', 'end': '2020-08-01', 'interval': 'DAY' }) c3Viz(results) |
To learn more about evaluating and visualizing metrics, please see the C3 AI Developer Documentation here:
Note: Metrics can only be evaluated on C3 Types that mix in the 'MetricEvaluatable
' Type.
Official C3 AI Developer Documentation:
For most data analysis, C3 AI developers run the 'fetch
' and 'evalMetrics
' methods. This C3.ai DTI Quickstart guide provides an introduction to these methods in which the C3 AI Suite is used as a read-only database accessed via APIs. In the following guides, you will learn how to run 'write
' operations on the C3 AI Suite such as:
Welcome to the start of your experience with the C3 AI Suite.