Machine Learning, Andrew Ng, Stanford University/Coursera, https://www.coursera.org/learn/machine-learning/
- Beginner level, Basic programming skills needed, Theory, Hands-on exercises
Machine Learning Mastery, Jason Brownlee, https://machinelearningmastery.com/
- Foundations, Beginner level, Intermediate level, Advanced level, Theory, Hands-on exercises,
Google's Machine Learning Crashcourse, https://developers.google.com/machine-learning/crash-course
- Beginner level, Intermediate level, Advanced level with limited knowledge about TensorFlow (TF) Framework, Theory
Kaggle's Introduction to Machine Learning, https://www.kaggle.com/learn/intro-to-machine-learning
- Beginner level, Theory, Hands-on tutorials using Jupyter Notebook
Machine Learning with Python: A Practical Introduction, https://www.edx.org/course/machine-learning-with-python-a-practical-introduct
- Beginner level, Basic Python knowledge recommended, Theory, Hands-on exercises,
Notes On Using Data Science & Machine Learning, https://chrisalbon.com/#code_machine_learning
Hands-on, practical and applied learning resources:
- Practical Deep Learning for Coders (Fast.ai) — One of the fastest practical learning materials out there.
- Dive into Deep Learning — Dive into Deep Learning 0.17.5 documentation (d2l.ai) – Good for concise topic-specific references.

Deep Learning - Text Analysis(Minu Mathew)

...

Model selection
1. Structured vs unstructured
2. Pre-trained models (called Model Zoos, Model hub, or model garden): PyTorch, PyTorch Lightning, TF, TF garden, HuggingFace, PapersWCode, https://github.com/openvinotoolkit/open_model_zoo
  1. B-tier: https://github.com/collections/ai-model-zoos, CoreML (iOS), iOS, Android, web/JS, ONNX zoo, Largest quantity hit-or-miss quality
  2. Fastest to use is SkLearn (AutoML).
  3. PyTorch Lightning.
  4. FastAI
  5. XGBoost & LightGBM
3. For measuring success, I like F-1 scores (can be weighted).
Data pipelines
1. Luigi, Airflow, Ray, Snake(?), Spark.
2. Globus, APIs, S3 buckets, HPC resources.
3. Configuring and running Large ML training jobs, on Delta.
4. Normal: Pandas, Numpy
5. Big:
  1. Spark (PySpark)
  2. Dask
  3. XArray
  4. Dask - distributed pandas and Numpy
  5. Rapids
    1. cuDF - cuda dataframes
    2. Dask cuDF - distributed dataframe (can’t fit in one GPU’s memory).
  6. Rapids w/Dask (cudf) - distributed, on-gpu calculations. Blog, reading large CSVs.
1. Key idea: make data as info-dense as possible.
2. Limit correlation between input variables (Pearson or Chi-squred) — this is filter-based, you can also do permutation-based importance.
3. Common workflow: Normalization → Pearson correlation → XGBoost feature importance → Kernel PCA dimensionality reduction.
  Data cleaning (and feature engineering ← this is jargon)
4. Always normalize both inputs and outputs. Original art by Kastan Day at KastanDay/wascally_wabbit (github.com)
Easy parallelism in Python
1. HPC: Parsl, funcX: Federated Function as a Service
2. Commercial or Cloud: Ray.io
Serving
1. Gradio & HF Spaces & Streamlit & PyDoc
2. Data and Learning Hub for Science (research soft.) Dan Katz.
3. Triton, TensorRT and ONNX. NVIDIA Triton Inference Server
Distributed training
1. XGBoost - Dask.
2. LightGBM - Dask or Spark.
3. Horovod.
4. PyTorch DDP (PyTorch lightning) Speed Up Model Training — PyTorch Lightning 1.7.0dev documentation
5. General idea: Pin certain layers to certain devices. Simple cases aren’t too bad in theory, but require a fair bit of specific knowledge about the model in question.
6. Flavors of Parallelism
  1. Easy: XGBoost or LightGBM. Python code: Dask, Ray, Parsl.
  2. Medium: Data parallelism in Horovod, DeepSpeed, PyTorch DDP. GPU operations with Nvidia RAPIDS.
  3. Hard: model parallelism. Must-read resource: Model Parallelism (huggingface.co)
  4. My research: FS-DDP, DeepSpeed, pipeline parallelism, tensor parallelism, distributed-all-reduce, etc.
  5. Glossary
    1. DDP — Distributed Data Parallel
    2. PP - Pipeline Parallel (DeepSpeed)
    3. TP - Tensor Parallel
    4. VP - Voting parallel (usually decision tree async updates, e.g. LightGBM)
    5. MP - Model Parallel (Model sharding, and pinning layers to devices)
    6. FS-DDP - Fully Sharded Distributed Data Parallel
Fine-tune on out-of-distribution examples?
1. TBD: What's the best way to fine-tune?
2. TBD: How do you monitor if your model is experiencing domain shift while in production? WandB Alerts is my best idea.
3. Use Fast.ai w/ your PT or TF model, I think.
4. A motivating example: the Permafrost Discovery Gateway has a great classifier for satellite images from Alaska, but need to adjust it for Alaska. How can we best fine-tune our existing model to this slightly different domain?
MLOps
1. WandB.ai — Highly recommended. First class tool during model development & data pre-processing.
2. Spell
3. https://github.com/allegroai/clearml ClearML
4. MLOps: What It Is, Why It Matters, and How to Implement It - neptune.ai
5. The Framework Way is the Best Way: the pitfalls of MLOps and how to avoid them | ZenML Blog
HPC resources at UIUC
1. NCSA Large: Delta (and Cerebras). External, but friendly: XSEDE (Bridges2).
2. NCSA Small: Nano, Kingfisher, HAL (ppcle).
3. NCSA Modern: DGX, and Arm-based with two A100(40G) (via Hal-login3).

Environments on HPC

module load <TAB><TAB> — discover preinstalled environments
Apptainer (previously called Singularity): Docker for HPC, requires few permissions.
1. Write DOCKERFILEs for HPC, syntax here.
Globus file transfer — my favorite. Wonderfully robust, parallel, lots of logging.
Towards the perfect command-line file transfer: Xargs | Rsync Xargs to parallelize Rsync for file transfer and sync (NCSA wiki resource) and another 3rd party blog.

Rsync essential reference

# My go-to command. Sytax like scp.
rsync -azP source destination

# flags explained
# -a is like than scp's `-r` but it also preserves metadata and symblinks. 
# -z = compression (more CPU usage, less network traffic) 
# -P flag combines the flags --progress and --partial. It enables resuming. 

# to truly keep in sync, add delete option 
rsync -a --delete source destination

# create backups 
rsync -a --delete --backup --backup-dir=/path/to/backups /path/to/source destination

# Good flags 
--exclude=pattern_to_exclude
-n = dry run, don't actually do it. just print what WOULD have happened.

Cheapest GPU cloud compute

The benefit: sudo access on modern hardware and clean environments. That's perfect for when dependency-hell makes you want to scream, especially when dealing with outdated HPC libraries.

Google Colab (free or paid)
Kaggle kernels (free)
LambdaLabs (my favorite for cheap GPU)
DataCrunch.io (my favorite for cheap GPU, especially top-of-the-line a100s 80G)
Grid.ai (From the creator of PyTorch Lightning)
PaperSpace Gradient
GCP and Azure — lots of free credits floating around.
- Azure is one of few that have 8x80GB A100 systems. For ~$38/hr. Still, sometimes you may need that.

New topics

Streaming data for ML inference

Event listeners...
Data + AI Summit 2021 Agenda - Databricks

Domain drift, explainable ai, dataset versioning (need to motivate, include in hyperparam search).

Apache Iceberg - ETL & high perf.
Project Nessie: Transactional Catalog for Data Lakes with Git-like semantics
- Like Git for data.

Explainability tools:

SHAP
ELI5
(Gradient boosted) decision trees
- XGBoost
- LightGBM: https://github.com/microsoft/LightGBM

Using GPUs for Speeding up ML (Vismayak Mohanarajan)

Rapids - cuDF and cuML

Colab Page - https://colab.research.google.com/drive/1bzL-mhGNvh7PF_MzsSgMmw9TQjyP6DCe?usp=sharing

ML Pathways

<List of some popular ML learning pathways and a brief comment about each>

Conclusion

Conda Best Practices

When sharing Conda envs: Consider, are you sharing with others or using Conda in Docker?

...

Therefore, I recommend running mamba install first and if you get error “cannot resolve dependencies,” then try conda install for more power, at the cost of being slow. If you have to pick one, conda is strictly more capable.

Cheap compute

The benefit: sudo access on modern hardware and clean environments. That's perfect for when dependency-hell makes you want to scream, especially when dealing with outdated HPC libraries.

Google Colab (free or paid)
Kaggle kernels (free)
LambdaLabs (my favorite for cheap GPU)
DataCrunch.io (my favorite for cheap GPU, especially top-of-the-line a100s 80G)
Grid.ai (From the creator of PyTorch Lightning)
PaperSpace Gradient
GCP and Azure — lots of free credits floating around.
- Azure is one of few that have 8x80GB A100 systems. For ~$38/hr. Still, sometimes you may need that.

Best AI Courses

Practical Deep Learning for Coders (Fast.ai) — One of the fastest practical learning materials out there.
Dive into Deep Learning — Dive into Deep Learning 0.17.5 documentation (d2l.ai) – Good for concise topic-specific references.

New topics

Streaming data for ML inference

Event listeners...
Data + AI Summit 2021 Agenda - Databricks

domain drift, explainable ai, dataset versioning (need to motivate, include in hyperparam search).

Apache Iceberg - ETL & high perf.
Project Nessie: Transactional Catalog for Data Lakes with Git-like semantics
- Like Git for data.

Explainability tools:

SHAP
ELI5
XGBoost
LightGBM: https://github.com/microsoft/LightGBM

Using GPUs for Speeding up ML (Vismayak Mohanarajan)

Rapids - cuDF and cuML

Colab Page - https://colab.research.google.com/drive/1bzL-mhGNvh7PF_MzsSgMmw9TQjyP6DCe?usp=sharing

ML Pathways

<List of some popular ML learning pathways and a brief comment about each>

Conclusion

...

Rsync Best Practices

Rsync syntax is modeled after scp. Here is my favorite usage.


# My go-to command:
rsync -azP source destination

# flags explained
# -a is like than scp's `-r` but it also preserves metadata and symblinks. 
# -z = compression (more CPU usage, less network traffic) 
# -P flag combines the flags --progress and --partial. It enables resuming. 

# to truly keep in sync, add delete option 
rsync -a --delete source destination

# create backups 
rsync -a --delete --backup --backup-dir=/path/to/backups /path/to/source destination

# Good flags 
--exclude=pattern_to_exclude
-n = dry run, don't actually do it. just print what WOULD have happened.

Space shortcuts

Page tree

Versions Compared

Old Version 41

New Version 42

Key

Deep Learning - Text Analysis(Minu Mathew)

Cheapest GPU cloud compute

New topics

Streaming data for ML inference

Using GPUs for Speeding up ML (Vismayak Mohanarajan)

ML Pathways

Conclusion

Conda Best Practices

Cheap compute

Best AI Courses

New topics

Streaming data for ML inference

Using GPUs for Speeding up ML (Vismayak Mohanarajan)

ML Pathways

Conclusion

Rsync Best Practices

References

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 41

New Version 42

Key

Deep Learning - Text Analysis(Minu Mathew)

Cheapest GPU cloud compute

New topics

Streaming data for ML inference

Using GPUs for Speeding up ML (Vismayak Mohanarajan)

ML Pathways

Conclusion

Conda Best Practices

Cheap compute

Best AI Courses

New topics

Streaming data for ML inference

Using GPUs for Speeding up ML (Vismayak Mohanarajan)

ML Pathways

Conclusion

Rsync Best Practices

References