You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 63 Next »

The C3 AI Suite provides researchers many tools to analyze data and build and deploy machine learning models. This guide explains how to connect to a C3.ai cluster, access data using C3.ai methods, and convert C3.ai method outputs to an easy-to-analyze form. Additionally, the guide also provides more detailed instructions to DTI members using the Covid-19 DataLake. Examples in this guide work using the base Datalake available in the git repository: https://github.com/c3aidti/dtiTraining

Please note, this guide covers how to run read-only queries on the C3 AI Suite. For more advanced topics, such as loading data, building metrics, or configuring and training machine learning models, please refer to the following wikis:

  • Data Integration (Not yet available)
  • Metrics (Not yet available)
  • Machine Learning (Not yet available)

Terminology

To best understand the C3 AI Suite and this guide, we're going to introduce some terminology used throughout the suite.

  • Type: Everything within the C3 AI Suite is stored and accessed through Types. These are objects akin to a Java class which contain 'fields' and 'methods'. Some are persisted to internal databases, and others are not. Nearly every aspect of the C3 AI Suite is accessed through Types.
  • Field: A field of a C3 Type. This contains data associated with the Type.
  • Method: A method defined on a C3 Type.
  • Vanity Url: The URL at which a specific tenant/tag of a C3 Cluster can be accessed. The C3 Cluster itself has a URL as well, however most interaction with the C3 AI Suite is done through the vanity url.
  • Cluster: A deployment of the C3 AI Suite. This can exist in the cloud or in a container. The C3 AI Suite is capable of running on top of numerous tehnologies such as different cloud providers, or virtualization strategies.
  • Tenant: A logical partition of a C3 Cluster. While internally, some data between Tenants may be stored on the same database, this access is not extended to Users of the C3 AI Suite. Users on one tenant can't see data stored on another Tenant.
  • Tag: A slot on which a C3 package is run. Tags sit within a Tenant.
  • Package: The code which the C3 AI Suite runs on a Tag. This is what the developer edits.
  • Provisioning: The loading a Package onto a C3 Tenant/Tag.
  • Static Console: The main method C3 developers use to interact with their C3 Tag. You can access the static console at the url 'https://<vanity_url>/static/console' (Replace <vanity_url> with your vanity url.)
  • Metric: A data analysis object which turns timeseries-like data into a timeseries.

C3 Cluster Overview

The C3 AI Suite is a Platform as a Service (PaaS) system which can exist on top of a number of virtualization technologies and platforms. Generally, A C3 Cluster consists of one or more master nodes which orchestrate jobs which need to be completed, worker nodes that carry out scheduled tasks, and finally some nodes dedicated to technologies on which the platform is based such as postgres and cassandra. On top of this physical computational structure sits a logical software structure which is starts at the top level of Cluster, then Tenant, then Tag. Each cluster contains Tenants which are logically separated from each other (e.g., Packages run on separate Tenants cannot view data from eachother), and each Tenant contains Tags. Tags house C3 Packages which are the actual code that C3 developers provision to the platform. A typical Multi-user Multi-tenant C3 Cluster is shown in a logical diagram below:

To learn more about the general structure of a C3 cluster, please see the C3.ai resources here:

Provision a C3 Package

Provision your C3 package to your C3 cluster/tenant/tag following the instructions available at the DTI Provisioning Guide. DTI members wishing to execute the examples in this guide should provision the 'baseCovidDataLake' following the directions under the heading 'COVID-19 DataLake Provisioning'.

To learn more about the general structure of a C3 cluster, please see the C3.ai resources here:

Connecting to a C3.ai cluster

The static console is the main location from which C3 developers typically configure and interact with the C3 AI Suite. We anticipate however, that most DTI researchers will use Python for data analysis. That said, the static console is an essential component of working with the C3 AI Suite, and you will use it frequently. For example, the static console is the best place to find documentation tailored directly to your C3 Package. Its also a great place to quickly test some queries since no specialized environments need to be set up to use it. It's ready to go in your browser.

Accessing the Static Console

Once your C3 package has been provisioned to your Tenant/Tag, Navigate to the static console page. This is at 'https://<vanity_url>/static/console' (Replace <vanity_url> with your vanity url., e.g., https://dti-mkrafczyk.c3dti.ai/static/console). The static console page looks like this:

The 'Tools' menu in the upper left hand corner contains a menu to access some available tools. Most relevant is the Provisioner, though there are also utilities for loading JavaScript files, debugging JS code, and inspecting Errors.

The 'Help' menu in the upper left hand corner contains a quick menu to access console documentation and a C3 Cluster hosted documentation portal.

Additionally, most tools are also accessible in the upper right hand corner with a series of Icons:

Using the Static Console

Once you're at the static console, the primary method of interaction is through the JavaScript console of your browser. When the static console page loads (or when you run the c3ImportAll() command), JavaScript methods associated with all of your Package's defined Types are populated. This allows you to run JavaScript code right in the console to interact with your C3 Package.

Most browsers use the keyboard shortcut 'Ctrl+Shift+I' to open the JavaScript console. It is also usually available through the browser's developer tools. If the 'Ctrl+Shift+I' keyboard shortcut doesn't work for you, look at your browsers documentation for the developer tools. With the JavaScript console open, the static console looks like this on the Firefox browser:

Finally, we can enter some JavaScript code to see the console in action!

Console Commands

We review here several highly used JavaScript console commands which are available on the static console page.

  • c3ImportAll: A console command which loads the API of the current C3 Package. This is necessary after provisioning a new Pacakge if you haven't refreshed your static console page.
  • c3Grid: A console command to display a table of data contained within a C3 Type. (e.g., data returned from a fetch operation, or an evaluate operation among many others).
  • c3Viz: A console command which can produce quick visualizations of some C3 Types. (e.g., timeseries data like EvalMetricsResult)
  • c3ShowType: A console command which produces documentation about a given type. (e.g., c3ShowType(OutbreakLocation))

Official C3.ai Documentation For The Static Console

Using Python with the C3 AI Suite

We anticipate most DTI researchers will want to use Python for data analysis. There are two options to connect to a C3.ai Cluster via Python. Please follow the links below for detailed information about each.

Fetching Instances of Types

All data in the C3 AI Suite are stored in C3.ai Types.Users can access data from a C3.ai Type with the 'fetch' method. Behind the scenes, the 'fetch' method submits a query directly to the database underlying a C3.ai Type, and retrieves and presents query results to C3 AI Suite users.

The C3 AI Suite returns the 'fetch' query's response, which includes (1) data from the C3.ai Type itself; (2) Metadata for the 'fetch' query (e.g., the number of objects, whether additional data exists in the database) into the FetchResult type, for data analysis (see example below).

To learn more about the 'fetch' method, please see the following C3.ai Developer Documentation:

Users can also provide a FetchSpec (or parameters) to the 'fetch' method to describe particular data to retrieve (e.g., only retrieve gene sequences collected in Germany). The FetchSpec can be 'empty' (e.g., OutbreakLocation.fetch()), or contain several parameters to return a subset of the data.

Some example FetchSpec parameters include:

  • filter: Filter expression to return a subset of the data (e.g., age <= 20). Filter expressions must evaluate to a Boolean type (i.e., true or false)
  • limit: the maximum number of rows that should be returned. Be default, if no limit is specified, the C3 AI Suite returns 2,000 rows from the C3.ai Type. Specifying a limit is often helpful to debug a fetch 'method', without returning too many records.
  • include: Specifies the particular fields from a C3.ai Type to return to the FetchResult. By default, if no include spec is defined, all fields from the C3.ai Type will be returned.
  • order: Specifies the order to return the query's results (either "ascending" or "descending")

Note: Please see this C3.ai Developer Documentation for full list of FetchSpec parameters: https://developer.c3.ai/docs/7.12.17/type/FetchSpec

Examples of Fetch Calls

The OutbreakLocation Type contains information various locations for which the Covid-19 DataLake has virus-related information. We can fetch OutbreakLocation records, for which the 'latestTotalPopulation' field exists (i.e., is not null). We can also retrieve these records in descending order by their 'countryArea'.

res = OutbreakLocation.fetch({
	'limit': -1,
	'filter': 'exists(latestTotalPopulation)',
	'order': 'descending(latestTotalPopulation)',
	'include': 'id, name, latestTotalPopulation, populationOfAllChildren, countryArea, countryCode'
})

And we can show these results in the C3 static console using the c3Grid command.

You can run this same fetch in Python:

raw_data = c3.OutbreakLocation.fetch({
	'limit': -1,
	'filter': 'exists(latestTotalPopulation)',
	'order': 'descending(latestTotalPopulation)',
	'include': 'id, name, latestTotalPopulation, populationOfAllChildren, countryArea, countryCode'
})

Additional details on "Fetching in Python" are available in this C3.ai Developer Documentation: https://developer.c3.ai/docs/7.12.0/topic/ds-jupyter-notebooks

Additional examples of fetch calls can be found in our examples here:

The fetchCount Method

Another useful fetch command is fetchCount. This function is nearly identical to the fetch commands above, but it just returns the number of records which match the fetch filter. This is useful when trying to determine whether a given search is refined enough.

OutbreakLocation.fetchCount({'filter': 'exists(latestTotalPopulation)'})

The same in python is:

c3.OutbreakLocation.fetchCount(spec={'filter': 'exists(latestTotalPopulation)'})


Converting Fetch results to usable forms in Jupyter Notebook

When using a Jupyter Notebook, C3.ai developers typically modify FetchResults for data analysis. This section shows a couple ways to convert FetchResults into easy-to-analyze forms.

Python

In python, first, retrieve the 'objs' field from the FetchResults object, and then call the toJson() function. The toJson() function returns an array of dictionaries each with keys equal to the requested fields of the fetched C3.ai Type. Using the Pandas library, this array can be turned into an analysis-ready DataFrame, as the below example shows.

A Code Example in Jupyter Notebook:

import pandas as pd
df = pd.DataFrame(raw_data.objs.toJson())
df.head()
df.drop('meta', axis=1, inplace=True)
df.drop('type', axis=1, inplace=True)
df.drop('version', axis=1, inplace=True)
df.drop('id', axis=1, inplace=True)
df.head()


Users can then manipulate the resulting DataFrame, using common programming libraries and frameworks.

ExpressionEngineFunctions

The C3 AI Suite also provides a pre-built library of "ExpressionEngineFunctions". Expression EngineFunctions take a variety of arguments and perform various data processing tasks. For example, the function 'contains' takes two strings as arguments, and checks whether the first argument contains the second argument. The function 'lowerCase' takes as input a string, and returns that same string with all lowercase letters. In addition to these string processing functions, the C3 AI Suite's ExpressionEngine includes many math functions such as 'log', 'avg', and 'abs', which operate on a various input data types (e.g. int, double, float).

The ExpressionEngine Functions are used in several places such as:

  • 'fetch filters
  • simple and compound metric expressions
  • tsDecl metric values

Please see this C3.ai Developer Documentation for a full list of the C3 AI Suite's ExpressionEngineFunctions: https://developer.c3.ai/docs/7.12.0/type/ExpressionEngineFunction

Simple Expressions on Types using Evaluate

The C3 AI Suite provides the 'evaluate' method to compute simple expressions on the data stored within a C3 Type. (e.g., compute the average area of all OutbreakLocations which are countries and for which we have area information)

The evaluate function takes the parameters:

  • 'projection': [Required] A comma separated list of valid expressions or ExpressionEngineFunctions to evaluate on the aggregated Type data. Behind the scenes, the C3 AI Suite translates these expressions to necessary SQL queries, but not all ExpressionEngineFunctions can be evaluated in SQL. In these cases, evaluate will try to do this itself, but without other SQL abilities like grouping or ordering.
  • 'group': A comma separated list of valid expressions or ExpressionEngineFunctions to evaluate as a group parameter of the SQL query
  • 'having': An SQL style having clause.
  • 'order': A comma separated list of valid expressions or ExpressionEngineFunctions to perform an ordering of the results by
  • 'filter': A fetch filter expression which restricts the rows evaluate is run against.

On the static console, using the 'c3Grid' displays evaluate's result nicely:

var eval_result = OutbreakLocation.evaluate({
    'projection': 'avg(countryArea), locationType',
    'group': 'locationType',
    'filter': 'exists(countryArea) && exists(locationType)'
})
c3Grid(eval_result)


We can also use 'evaluate' in Python, but we have to use a helper function. We've defined this for you with the DTI's c3python module available here: https://github.com/c3aidti/c3python

eval_spec = {
    'projection': 'avg(countryArea), locationType',
    'group': 'locationType',
    'filter': 'exists(countryArea) && exists(locationType)'
}
eval_res = c3.OutbreakLocation.evaluate(eval_spec)
df = c3python.EvaluateResultToPandas(result=eval_res, eval_spec=eval_spec)


Here's another example in Python:

spec = c3.EvaluateSpec(
    projection="ethnicity, count(ethnicity)",
    group="ethnicity"
)
c3python.EvaluateResultToPandas(result=c3.SurveyData.evaluate(spec), eval_spec=spec)

To learn more about the evaluate method, please see the C3.ai Developer Documentation here:

Developing Metrics on Timeseries data

The C3 AI Suite also offers several features to handle timseries data. To interact with timeseries C3.ai developers typically use simple and compound metrics. These metrics are used in several places in the C3 AI Suite such as:

  • Alerts and Application Logic
  • Machine Learning Features
  • User Interface (to Visualize Data)

Simple Metrics

Simple metrics allow C3.ai developers to produce timeseries from raw data, and are often used to construct more advanced metrics (i.e., Compound Metrics), in practice. Simple metrics are linked to a specific C3.ai Type and reference the timeseries data stored within that C3.ai Type. To declare a simple metric, users should specify the following fields:

  1. id: simple metric's unique id, which should follow the convention "name_srcType" (e.g., Apple_DrivingMobility_OutbreakLocation)
  2. name: simple metric's name (e.g., Apple_DrivingMobility)
  3. description: simple metric's description (optional field)
  4. srcType: the C3.ai Type the simple metric is analyzed on (e.g., OutbreakLocation)
  5. path: path from the srcType to the C3.ai Type, that stores the raw data referenced by the simple metric (e.g., pointMeasurements) Note: if the srcType itself stores the raw data referenced by the simple metric, path field is optional.
  6. expression: the expression (or ExpressionEngineFunction) applied to the raw data, referenced by the simple metric (e.g., avg(avg(normalized.data.quantity)). Note: the "normalized" key term, instructs the simple metric to use normalized (instead of raw) data on the C3 AI Suite (to learn more about Normalization, see this C3.ai Developer Documentation: https://developer.c3.ai/docs/7.12.17/topic/normalization )

Here is an example of a Simple Metric:

met = c3.SimpleMetric(
  id='JHU_ConfirmedCases2_OutbreakLocation',
 name='JHU_ConfirmedCases2',
 srcType='OutbreakLocation',
 path="aggregateMeasurements.(measurementType == 'confirmed' && origin == "
       "'Johns Hopkins University')",
 expression='interpolate(avg(avg(normalized.data.value)), "PREVIOUS", "MISSING")'
)

To learn more about Simple Metrics, please see the C3.ai Developer Documentation here:

Another type of SimpleMetric is a tsDecl (Timeseries Declaration) metric. tsDecl metrics are often used to turn non-timeseries raw data (e.g., event data, status data, or data with irregular intervals) into timeseries. tsDecl metrics have the same fields as standard SimpleMetric, except for the 'tsDecl' field, which replaces the 'expression' field. tsDecl metrics may allow users the added flexibility to define new metrics which the expression field may not support. Using a tsDecl metric, the above metric can be re-written as:

met = c3.SimpleMetric(
	id='JHU_ConfirmedCases3_OutbreakLocation',
	name='JHU_ConfirmedCases3',
	srcType='OutbreakLocation',
	path="aggregateMeasurements.(measurementType == 'confirmed' && origin == "
       "'Johns Hopkins University')",
	tsDecl={
		'data': 'data',
		'treatment': 'AVG',
		'start': 'start',
		'value': 'value'
	}
)

Please note that the above examples do not have an example context in which they work. This will be updated soon with a version backed up by a working exercise.

To learn more about tsDecl metrics, please see the C3.ai Developer Documentation here:

Compound Metrics

Compound metrics allow C3.ai developers to manipulate or combine existing metrics into more complex timeseries. Compound metrics are built on top of one or many existing Simple or Compound metrics. Please note, to evaluate a Compound metric on a C3.ai Type, all Simple metrics, used in that Compound metric, must be defined on that C3.ai Type, as well. Otherwise, an error is returned.

To declare a compound metric, users should specify the following fields:

  1. 'id': compound metric's unique id, typically the same as 'name' (e.g., BLS_UnemploymentRate)
  2. 'name': compound metric's name (e.g., BLS_UnemploymentRate)
  3. description: compound metric's description (optional field)
  4. expression: the expression (or ExpressionEngineFunction) applied to the metrics underlying the Compound metric (e.g., "BLS_LaborForcePopulation ? 100 * BLS_UnemployedPopulation / BLS_LaborForcePopulation: null")

An example CompoundMetric is:

met = c3.CompoundMetric(
	id='JHU_CaseFatalityRate',
	name='JHU_CaseFatalityRate',
	expression='JHU_ConfirmedDeaths/JHU_ConfirmedCases',
)

Please note, the above example is not tied to any sample exercises or hands-on tutorials. Sample exercises and hands-on tutorials will be added to this Wiki shortly.

To learn more about Compound metrics, please see the C3.ai Developer Documentation here:

Finding, Evaluating, and Visualizing Metrics

Users can evaluate & visualize metrics built in the C3 AI Suite, via the JavaScript console or a hosted Jupyter notebook.

Finding Metrics

All metrics that users build and deploy in the C3 AI Suite are also stored in C3.ai Types. To view a list of all the simple and compound metrics applicable to a C3.ai Type, run the 'listMetrics' method, as shown below:

Javascript:

var metrics = OutbreakLocation.listMetrics()
c3Grid(metrics)

Python:

import pandas as pd
pd.DataFrame(c3.OutbreakLocation.listMetrics().toJson())

DTI Members using the Covid-19 DataLake: While listMetrics does return a list, this is fairly bare bones if the 'description' field of a given metric isn't filled in. The Covid-19 DataLake API documentation provides an extensive list of production-ready metrics along with detailed descriptions and usage examples. Please see that documentation here: https://c3.ai/covid-19-api-documentation/

After finding a metric, the next step is to evaluate on data in a C3.ai Type.

Evaluating Metrics

Metrics are evaluated with either the 'evalMetrics' or 'evalMetricsWithMetadata' methods. Behind the scenes, 'evalMetrics' and 'evalMetricsWithMetadata', fetch and transform raw data from a C3.ai Type into easy-to-analyze timeseries data. 'evalMetrics' is used to evaluate metrics provisioned (deployed) to a C3.ai tenant/tag. 'evalMetricsWithMetadata' allows users to evaluate metrics either provisioned to a C3.ai tenant/tag or defined on-the-fly in JavaScript console or a hosted Jupyter notebook (typically for debugging).

To learn more about the differences between 'evalMetrics' and 'evalMetricsWithMetadata' see the C3.ai Developer Documentation here: https://developer.c3.ai/docs/7.12.0/type/MetricEvaluatable

To evaluate a metric, users must provide the following parameters (called an EvalMetricSpec) to the 'evalMetrics' or 'evalMetricsWithMetadata' methods.

  1. ids ([string]): A list of ids in the C3.ai Type, on which you want to evaluate the metrics (e.g., "Germany", "California_UnitedStates")
  2. expressions ([string]): A list of metrics to evaluate (e.g., "JHU_ConfirmedCases", "Apple_DrivingMobility")
  3. start (datetime): Start datetime of the time range to be evaluated (in ISO 8601 format) (e.g., "2020-01-01")
  4. end (datetime): End datetime of the time range to be evaluated (in ISO 8601 format) (e.g., "2020-08-01")
  5. interval (string): Desired interval for the resulting timeseries data (e.g., MINUTE, HOUR, DAY, MONTH, YEAR)

Here's an example of evaluating a metric in Python:

spec = c3.EvalMetricsSpec(
  ids=[ 'Illinois_UnitedStates', 'California_UnitedStates', 'UnitedStates' ],
  expressions=[ 'JHU_ConfirmedCases', 'JHU_ConfirmedDeaths' ],
  start='2020-01-01',
  end='2020-08-01',
  interval='DAY',
)

results = c3.OutbreakLocation.evalMetrics(spec=spec)

In Python, you can also specify the spec using a Dictionary without creating an EvalMetricsSpec Type:

results = c3.OutbreakLocation.evalMetrics(spec={
	'ids': [ 'Illinois_UnitedStates', 'California_UnitedStates', 'UnitedStates' ],
	'expressions': [ 'JHU_ConfirmedCases', 'JHU_ConfirmedDeaths' ],
	'start': '2020-01-01',
	'end': '2020-08-01',
	'interval': 'DAY',
})


The C3 AI Suite returns the evaluated metric results (a timeseries) into the 'EvalMetricsResult' type. With various helper functions, C3.ai developers may then convert this timeseries into a Pandas DataFrame (via "Dataset" type) for further data analysis or model development in a Jupyter notebook, as shown below.

ds = c3.Dataset.fromEvalMetricsResult(result=results)
df = c3.Dataset.toPandas(dataset=ds)

Additionally, users can visualize evaluated metric results directly in the web-browser (i.e., JavaScript console) with the 'c3Viz' function.

Here's an example of evaluating and visualizing in JavaScript console.

var spec = EvalMetricsSpec.make({
	'ids': ['Illinois_UnitedStates', 'California_UnitedStates', 'UnitedStates' ],
	'expressions': [ 'JHU_ConfirmedCases', 'JHU_ConfirmedDeaths' ],
	'start': '2020-01-01',
	'end': '2020-08-01',
	'interval': 'DAY'
})

var results = OutbreakLocation.evalMetrics(spec)
c3Viz(results)

Similarly, we don't have to explicitly create an EvalMetricsSpec type:

var results = OutbreakLocation.evalMetrics({
    'ids': ['Illinois_UnitedStates', 'California_UnitedStates', 'UnitedStates' ],
    'expressions': [ 'JHU_ConfirmedCases', 'JHU_ConfirmedDeaths' ],
    'start': '2020-01-01',
    'end': '2020-08-01',
    'interval': 'DAY'
})
c3Viz(results)


To learn more about evaluating and visualizing metrics, please see the C3.ai Developer Documentation here:

Note: Metrics can only be evaluated on C3.ai Types that mix in the 'MetricEvaluatable' Type.

Conclusion

Official C3.ai Developer Documentation:

Review and Next Steps


In most data analysis, C3.ai developers run the 'fetch' and 'evalMetrics' methods. This C3.ai DTI Quickstart guide provides an introduction to these methods, in which the C3 AI Suite is used as a read-only database, accessed via APIs. In the following guides, you will learn how to run 'write' operations on the C3 AI Suite such as:

  • Defining new types
  • Loading new data
  • Clean-up databases in your tag
  • Train machine learning models
  • And so on..

Welcome to the start of your experience with the C3 AI Suite.

  • No labels