Deep Learning - Text Analysis(Minu Mathew)

Natural language - no structure. Computers like some structure. So try to introduce some structure.

Regular Expressions :

Good for quick string comparisons, transformations.

Tokenization, Normalization and stemming - methods to add some structure

Dimensionality reduction :

Capture the most important structure.

convert high dimensional space to a low dimensional space by preserving only important vectors (Eigen vectors) - get rid of highly correlated dimensions and reduce to single dimension.

Method to transform text to numeric :

Vocab count / Bag of Words (BOW) - no contextual info kept

Remove stop words

One-hot encoding
Frequency count - no contextual info kept
TF-IDF - no contextual info kept
Word Embeddings : preserve contextual information. Get the semantics of a word.

Learn word embeddings using n-gram (pyTorch, Keras )
Word2Vec (pre-trained word embeddings from Google) - Based on word distributions and local context (window size).
GLoVe (pre-trained from Stanford) - based on global context
BERT

Models :

RNN

LSTM

CNN

Transformer architecture :

BERT model

XL-Net (by microsoft) - BERT and GPT-3 works better in general

GPT-3 model:

ML Ops (Kastan Day, Todd Nicholson)

...

Space shortcuts

Page tree

Versions Compared

Old Version 8

New Version 9

Key

Deep Learning - Text Analysis(Minu Mathew)

ML Ops (Kastan Day, Todd Nicholson)

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 8

New Version 9

Key

Deep Learning - Text Analysis(Minu Mathew)

ML Ops (Kastan Day, Todd Nicholson)