Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Vocab count / Bag of Words (BOW) - no contextual info kept
    • Use count vectorizer from sklearn or TF-IDF (better)
    • Remove stop words
  • One-hot encoding
  • Frequency count - no contextual info kept
  • TF-IDF - no contextual info kept
  • Word Embeddings : preserve contextual information. Get the semantics of a word.
    • Resource on implementation of various embeddings
    • Learn word embeddings using n-gram (pyTorch, Keras ). Here the word embeddings are learned in the training phase, hence embeddings are specific to the training set.
    • Word2Vec (pre-trained word embeddings from Google) - Based on word distributions and local context (window size). 
    • GLoVe (pre-trained from Stanford) - based on global context
    • BERTembeddings
    • GPT3 embeddings 
  • Using word embeddings:
    • Learn it (not recommended)
    • Reuse it (check what dataset the embeddings has been trained on)
    • Reuse + fine-tune 



Models :

  1. Recurrent Neural Network(RNN)

...