Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: remove hard line breaks


  1. Introduce researchers of all stripes to the system
  2. Help researchers determine what level of training they will need to leverage's resources
  3. Point researchers directly to relevant documentation they will need
  4. Provide worked examples of different research workflows and how they may be ported into
    the into the environment, or may use's resources

... is a data analytics engine designed to make the ingestion and analysis of heterogeneous data sources
as sources as painless as possible. The platform joins data from multiple sources into a single unified federated data image.
With the federated data image defined, then provides an API to access that data, and in the case of time-series data,
perform numerous transformations and computations all producing normalized time-series data at regular intervals.

If you want more background on the platform, there is a one-hour DTI webinar describing its
capabilities hereits capabilities here also supports R and Python Jupyter notebook analysis of the federated data image. These notebooks provide a
great a great way for researchers to analyze data close to where the data is stored. While supports many data science
capabilities familiar to the researcher, some expected functionality may be missing. For these cases, supports
implementing supports implementing new data processing functions in Python and JavaScript.

Like any other API porting your own workflows will take some care and time to learn properly. Please leverage this guide
to guide to make understanding the platform and porting your workflow as quick and easy as possible.


  • Traditional HPC systems are similar to Hardware as a Service (HaaS), while is more like a Platform as a Service (PaaS).
    Users  Users are encouraged to work within the platform's API to achieve the best performance out of
  • offers a state-of-the-art data integration system as the basis for all Data Science operations.
    This  This is in contrast to HPC systems where all components of data management and the analysis pipeline must be installed and
    managed independently.

What types of software can be run on


Use this guide to determine what training you need to utilize resources effectively. We have identified four
categories four categories of usage of the platform. We include basic examples of workflows which might fall into that level,
pros  pros and cons of operating on that level, and a list of training resources we recommend resources researchers
completing researchers completing on the DTI training environment before starting their allocations. This will ensure researchers will
be will be able to use their allocation as efficiently as possible.

Examine the high level overviews of each level below, then click the section titles to go to more in-depth
discussions depth discussions related to that level, like the recommended training.


For many researchers, accessing the public API for the COVID-19 Federated Data Image will be enough for their research goals.
The  The public API provides fetch access to many datalake objects, metrics access to some time series data such as case data,
and allows you to pull local copies of those objects and metrics results into your local compute environment. 


Full access to the Datalake offers access to all stored COVID-19 Datalake data while still allowing the researcher to use whatever
analysis framework they so choose with their own compute resources. This level offers the fastest startup time while still ensuring
access ensuring access to all data. Once you learn how to query data in C3, that data can be streamed to your compute resources where you can
use your language and tools of choice.


Some researchers will want to write their own C3 package and leverage more of the AI Suite. C3 allows researchers to
define to define their own types and methods to integrate their data into the C3 AI Suite – either independently or alongside the COVID-19
Datalake. This allows researchers to use C3 data analytics methods such as timeseries metrics just as they would on other
Datalake data. Researchers will also have the ability to share their data with other researchers in the DTI by sharing their package.
Adding  Adding another researcher's package as a dependency to your package will also bring another researcher's data into
your into your package as well.

Level 4: Advanced C3 Platform Usage (In Progress)

Some researchers will want to bring state-of-the-art ML workflows to can support such workflows, but
extra but extra work may be needed.

Covid-19 Datalake

As part of the initial C3 DTI, C3 is curating the Covid-19 Datalake. Follow the link above for more detailed information about
this about this Datalake.


This section introduces the process to access Generally speaking, once you receive your grant,
the  the DTI team will reach out and discuss with you what your needs are. The process will be:

  1. Determine which researchers will require access to a environment
  2. Each researcher will be given a developer portal login.
  3. Each researcher will be given a tag on the DTI training cluster.
  4. Once training is complete, discuss with the DTI team what your needs
    for needs for a cluster will be.
  5. The DTI will work with to stand up a new tag for your research.
  6. Access to that tag will be granted to your researchers
  7. Research can then proceed until your allocation is exhausted!


See the above link for a comprehensive list and categorization of the available training
materialstraining materials. This includes Documentation, DTI introductions, and DTI created examples and exercises.


No problem! Please send an email to with a description of your issue
and issue and one of our team will work with you to resolve it.


If you feel aspects of this guide are incomplete or inaccurate, please send an email to with the
issue the issue or suggestion, and we will work to incorporate it to make the documentation better. We appreciate the new perspective
More perspective more eyes can bring to a software project!