...
Deep Learning - Text Analysis(Minu Mathew)
Natural language - no structure. Computers like some structure. So try to introduce some structure.
Regular Expressions :
Good for quick string comparisons, transformations.
Tokenization, Normalization and stemming - methods to add some structure
Dimensionality reduction :
Capture the most important structure.
convert high dimensional space to a low dimensional space by preserving only important vectors (Eigen vectors) - get rid of highly correlated dimensions and reduce to single dimension.
Method to transform text to numeric :
- Vocab count / Bag of Words (BOW) - no contextual info kept
- Remove stop words
- One-hot encoding
- Frequency count - no contextual info kept
- TF-IDF - no contextual info kept
- Word Embeddings : preserve contextual information. Get the semantics of a word.
Models :
RNN
LSTM
CNN
Transformer architecture :
BERT model
XL-Net (by microsoft) - BERT and GPT-3 works better in general
GPT-3 model:
ML Ops (Kastan Day, Todd Nicholson)
...