[This document is under construction]

Contributors

Provide a list of contributors who have contributed to this document either by writing sections or by sharing ideas and participating in discussions.

Introduction

Provide a brief introduction to this document, the goals, what this isn't, and the process used by the focus group to develop this document.

Traditional Machine Learning (Minu Mathew, Sandeep Puthanveetil Satheesan)

Provide a brief introduction to machine learning and list major areas within machine learning with short descriptions.

Introductory Courses/Blogs

Machine Learning, Andrew Ng, Stanford University/Coursera, https://www.coursera.org/learn/machine-learning/
Machine Learning Crash Course, Google, https://developers.google.com/machine-learning/crash-course/
Machine Learning Mastery, Jason Brownlee, https://machinelearningmastery.com/

Deep Learning - Text Analysis(Minu Mathew)

Natural language - no structure. Computers like some structure. So try to introduce some structure.

Regular Expressions :

Good for quick string comparisons, transformations.

Tokenization, Normalization and stemming - methods to add some structure

Dimensionality reduction :

Capture the most important structure.

convert high dimensional space to a low dimensional space by preserving only important vectors (Eigen vectors) - get rid of highly correlated dimensions and reduce to single dimension.

Method to transform text to numeric :

Vocab count / Bag of Words (BOW) - no contextual info kept

Remove stop words

One-hot encoding
Frequency count - no contextual info kept
TF-IDF - no contextual info kept
Word Embeddings : preserve contextual information. Get the semantics of a word.

Learn word embeddings using n-gram (pyTorch, Keras )
Word2Vec (pre-trained word embeddings from Google) - Based on word distributions and local context (window size).
GLoVe (pre-trained from Stanford) - based on global context
BERT

Models :

RNN

LSTM

CNN

Transformer architecture :

BERT model

XL-Net (by microsoft) - BERT and GPT-3 works better in general

GPT-3 model:

ML Ops (Kastan Day, Todd Nicholson)

Using GPUs for Speeding up ML (Vismayak Mohanarajan)

Space shortcuts

Page tree

Contributors

Introduction

Traditional Machine Learning (Minu Mathew, Sandeep Puthanveetil Satheesan)

Introductory Courses/Blogs

Deep Learning - Text Analysis(Minu Mathew)

ML Ops (Kastan Day, Todd Nicholson)

Using GPUs for Speeding up ML (Vismayak Mohanarajan)

References

Space shortcuts

Page tree

[DRAFT] Hands-on Machine Learning Study Materials for Research Software Engineers - Focus Group Report

Contributors

Introduction

Traditional Machine Learning (Minu Mathew, Sandeep Puthanveetil Satheesan)

Introductory Courses/Blogs

Deep Learning - Text Analysis(Minu Mathew)

ML Ops (Kastan Day, Todd Nicholson)

Using GPUs for Speeding up ML (Vismayak Mohanarajan)

References