Ultimately, we want to connect our C3 cluster to a source of data. This is done through the process of 'Data Integration'. Generally, data flow from a source file or system into a so called 'Canonical' Type. This Canonical type is meant to mirror directly the data source. Next, a 'Canonical Transform' is defined which connects the Canonical Type to another C3 Type which is part of your final data model. A general Diagram follows:

Additional C3.ai resources:

Specialized Types

First, we'll discuss the specialized types which are used throughout the Data Integration system, and discuss what happens once the data enters the C3 AI Suite. Things within the C3 AI Suite are a little cleaner to think about first. Once we've established how things work inside the C3 AI Suite, we'll follow with how we can get the data into the first step of the C3 AI Suite's Data Integration System.

Canonical Types

A Canonical Type is the entry point of data into the C3 AI Suite. It is a special Type which mixes in the Canonical Type. Mixing in the Canonical type tells the C3 AI Suite to add some capabilities such as a RESTFUL API endpoint to ingest data, the ability to grab data from a seed data directory, and the ability to kick off the Data Integration pipelin when new data arrives. Conventionally, a Canonical type should start with the word 'Canonical'. Its fields should match the names of fields in the intended source, and the fields should be primitive types. For example, let's look at the type 'CanonicalSmartBulb' from the lightbulbAD tutorial package (See C3 lightbulbAD Package).

 * Copyright 2009-2020 C3 (www.c3.ai). All Rights Reserved.
 * This material, including without limitation any software, is the confidential trade secret and proprietary
 * information of C3 and its licensors. Reproduction, use and/or distribution of this material in any form is
 * strictly prohibited except as set forth in a written license agreement with C3 and/or its authorized distributors.
 * This material may be covered by one or more patents or pending patent applications.

* This type represents the raw data that will represent {@link SmartBulb} information.
type CanonicalSmartBulb mixes Canonical<CanonicalSmartBulb> {
   * This represents the manufacturer of a {@link LightBulb}
  Manufacturer: string

   * This represents the bulbType of a {@link LightBulb}
  BulbType:     string

   * This represents the wattage of a {@link LightBulb}
  Wattage:     decimal

   * This represents the id of a {@link LightBulb}
  SN:           string

   * This represents the startDate of a {@link LightBulb}
  StartDate:    datetime

   * This represents the latitude of a {@link SmartBulb}
  Latitude:     double

   * This represents the longitude of a {@link SmartBulb}
  Longitude:    double

We'll notice first, that far fewer fields are present here than in the SmartBulb Type. This is because the Canonical Type is just used as an entry point to the C3 AI Suite. You don't need to define any methods, and the only fields necessary are those needed to hold data from the source. In fact, you'll notice that the 'CanonicalSmartBulb' type doesn't use the 'entity' keyword. This means it isn't persisted either.

Generally speaking, you need to define a new Canonical Type for each type of data source.

C3.ai resources on Canonical Types

Transform Types

With a Canonical Type defined to receive new data into the C3 AI Suite, we need to move this data into the data model

Application Types

Basic Data Sources

CSV Files

JSON Files

Seed Data

Sending Data Via POST

Complex Data Sources

Custom External Database

C3 Supported Database technologies