Page History

...

Transformer architecture :

Attention mechanism : (Attention is all I need paper) - weights = softmax(Key, query, value)

BERT model

XL-Net (by microsoft) - BERT and GPT-3 works better in general

...

Space shortcuts