Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Transformer architecture :

Attention mechanism :  (Attention is all I need paper) - weights = softmax(Key, query, value)

BERT model

XL-Net (by microsoft) - BERT and GPT-3 works better in general

...