Discussion about this post

User's avatar
Mihály Nemes's avatar

An extremely comprehensible discussion of an extremely complex topic.

Expand full comment
Omar AlSuwaidi's avatar

This topic is very relevant and crucial, thanks for shedding light on it. Quick question, while Glove and Word2Vec models would directly output a static word embedding vector, BERT and other Transformers based models would output a sequence of tokens; they compute the contextualized word embeddings implicitly. So, *what* vector is exactly being used from these Transformers based models as a contextualized embedding (is it one of the Q, K, V vectors)?

Expand full comment
1 more comment...

No posts