This wiki site will be offline Weds, July 6th, 2022, from 5:30-8:30 PM CDT in order to upgrade Confluence
Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

Welcome to the C3 Digital Transformation Institute!

You have been given a grant as part of the new C3 Digital Transformation Institute (DTI)!
To make the start of your DTI experience as painless as possible, we have assembled a set of resources to

  1. Introduce researchers of all stripes to the C3 system
  2. Help researchers determine what level of training they will need to leverage C3's resources
  3. Point researchers directly to relevant documentation they will need
  4. Provide worked examples of different research workflows and how they may be ported into
    C3's environment, or may use C3's resources

Introduction to the C3 system

C3 is a Java-based data analytics engine designed to make the ingestion and analysis of heterogeneous data sources
as painless as possible.

C3 provides a system to join data from multiple sources into a single unified federated data image.

With the federated data image defined, C3 then provides an API to access that data, and in the case of time-series data,
perform numerous transformations and computations all producing normalized time-series data at regular intervals.

While C3 supports many data science capabilities familiar to the researcher, some expected functionality may be missing.
For these cases, C3 supports implementing new data processing functions in python and javascript.

Like any other API porting your own workflows will take some care and time to learn properly.

Please leverage this guide to make porting your

How does C3 differ from traditional HPC systems?

  • Traditional HPC systems are similar to Hardware as a Service (HaaS), while C3 is more like a Platform as a Service (PaaS).
    Users are encouraged to work within the platform's API to achieve the best performance out of C3.
  • C3 offers a state-of-the-art data integration system as the basis for all Data Science operations.
    This is in contrast to HPC systems where all components of data management and the analysis pipeline must be installed and managed independently.

What types of software can be run on C3?

  • Nearly any python module may be installed and used through pip or conda
  • Nearly any R package may be installed and used within the R juptyer environment.

What types of software cannot be run on C3?

  • General binary executables are not supported by C3 out of the box.
  • MPI-based python software
  • Packages which must be built from scratch on the platform, or require specific hardware drivers
  • Python modules which require special built binaries may not run as well.

How do I get started?

Level 1: Public API Access'


this is a test COVID-19 API Documentation asfda


Level 2: GUI based data analysis and full access to COVID-19 Datalake

Level 3: Utilize C3 AI Suite and Jupyter notebook analysis

Level 4: State-of-the-art ML workflows requiring special ML models and/or GPUs

Essential Concepts

C3 is quite different from traditional HPC resources.

C3 Types

Canonicals, Transforms, and Data Integration

Timeseries Analysis and Metrics

Machine Learning Pipelines

Jobs

How research workflows map onto C3

Here, we provide several examples of scientific workflows and how they can be efficiently mapped onto the C3 system.

Example 1

Example 2

Example 3

Help! This guide isn't helpful, or doesn't solve my problem!

No problem! You're not alone! Please send an email to help+c3ai@ncsa.illinois.edu with a description of your issue
and one of our team will work with you to resolve your issue.

Feedback

If you feel aspects of this guide are incomplete or Inaccurate, please send an email to help+c3ai@ncsa.illinois.edu with the
issue or suggestion, and we will work to incorporate it to make the documentation better. We appreciate the new perspective
More eyes can bring to a software project!

Your DTI Team

NCSA

Jay Roloff - Project Manager

Matthew Krafczyk - Data Analyst

Yifang Zhang - Data Analyst

Berkeley

Eric Fraser

Greg Merritt

Matt Podolsky

  • No labels