C3 AI has established the C3 AI COVID-19 Data Lake, a federated data lake containing a multitude of different datasets all related in someway to the COVID-19 pandemic. Public access to the Data Lake is granted through a RESTful API interface (see the official documentation here to learn more.

While for the public this provides a rich interface to a large conglomeration of data without the need to integrate multiple databases, members of the C3.ai DTI can have more direct access. Behind the C3 AI Data Lake is a package containing the definitions of every Type, as well as instructions for how to fetch that data and integrate it into the Data Lake's data model.

By learning to use the C3 AI platform, researchers can leverage many capabilities such as defining their own Metrics, training machine learning models, and additional helper functions to make navigating the Data Lake and data model easier.

Below, for the benefit of the C3DTI researchers, we share an up-to-date diagram of the entire C3 AI COVID-19 Data Lake data model.

Boxes denote a C3 Type defined in the Data Lake package, while the connecting lines define relationships between the Types. Each box contains a list of properties and in most cases their matching Types.

  • No labels