Welcome to the C3 Digital Transformation Institute!
You have been given a grant as part of the new C3 Digital Transformation Institute (DTI)!
To make the start of your DTI experience as painless as possible, we have assembled a set of resources to
- Introduce researchers of all stripes to the C3 system
- Help researchers determine what level of training they will need to leverage C3's resources
- Point researchers directly to relevant documentation they will need
- Provide worked examples of different research workflows and how they may be ported into
C3's environment, or may use C3's resources
If you have questions not covered by this guide, please contact the DTI team at the email email@example.com
Introduction to the C3 system
C3 is a Java-based data analytics engine designed to make the ingestion and analysis of heterogeneous data sources
as painless as possible.
C3 provides a system to join data from multiple sources into a single unified federated data image.
With the federated data image defined, C3 then provides an API to access that data, and in the case of time-series data,
perform numerous transformations and computations all producing normalized time-series data at regular intervals.
While C3 supports many data science capabilities familiar to the researcher, some expected functionality may be missing.
Like any other API porting your own workflows will take some care and time to learn properly.
Please leverage this guide to make porting your
Services available from C3
- Covid-19 Datalake: This unified federated Datalake includes data from numerous sources.
- C3 computing platform:
How does C3 differ from traditional HPC systems?
- Traditional HPC systems are similar to Hardware as a Service (HaaS), while C3 is more like a Platform as a Service (PaaS).
Users are encouraged to work within the platform's API to achieve the best performance out of C3.
- C3 offers a state-of-the-art data integration system as the basis for all Data Science operations.
This is in contrast to HPC systems where all components of data management and the analysis pipeline must be installed and managed independently.
What types of software can be run on C3?
- Nearly any python module may be installed and used through pip or conda
- Nearly any R package may be installed and used within the R juptyer environment.
What types of software cannot be run on C3?
- General binary executables are not supported by C3 out of the box.
- MPI-based python software
- Packages which must be built from scratch on the platform, or require specific hardware drivers
- Python modules which require special built binaries may not run as well.
How do I get started?
Use this guide to determine what training you need to utilize C3's resources effectively. We have separated
researchers into four levels based on what level of interaction with C3's resources they require. We include
basic examples of workflows which might fall into that level, pros and cons of operating on that level, and
a list of training resources we recommend resources researchers completing on the DTI training environment
before starting their C3 allocations. This will ensure researchers will be able to use their allocation as
efficiently as possible.
Examine the high level overviews of each level below, then click the section titles to go to more in-depth
discussions related to that level, like the recommended training.
For many researchers, they will simply want to leverage the C3 COVID-19 Federated Data Image.
- Easy to integrate into existing scientific workflows and run on existing scientific computational hardware
- Publicly available API means no credentials are needed to access the data
- Assuming you have access to your own computational resources, you don't have to worry about allocations
on C3's platform.
- All data used from the Datalake must be streamed to wherever you're processing data
- Performance benefits from working with the Datalake using C3 will not be available.
This section introduces the process to access C3.
C3 Allocation Management
This section introduces How researchers will be expected to manage their allocation while on the C3 platform.
C3 is quite different from traditional HPC resources. We have written an introduction to C3 from the
perspective of a scientific researcher. We go over several important C3 concepts and relate them to
what scientists are more familiar with.
Canonicals, Transforms, and Data Integration
Timeseries Analysis and Metrics
Machine Learning Pipelines
See the above link for a comprehensive list and categorization of the available training
Help! This guide doesn't solve my problem!
No problem! You're not alone! Please send an email to firstname.lastname@example.org with a description of your issue
and one of our team will work with you to resolve your issue.
If you feel aspects of this guide are incomplete or Inaccurate, please send an email to email@example.com with the
issue or suggestion, and we will work to incorporate it to make the documentation better. We appreciate the new perspective
More eyes can bring to a software project!
Your DTI Team
Jay Roloff - Executive Director
Matthew Krafczyk - Data Analyst
Yifang Zhang - Data Analyst
Larry Rohrbach - Executive Director