Interactive Quiz

What is the main difference between an autoregressive model (like GPT) and an encoder model (like BERT) in natural language processing?

What is the role of the Query (Q), Key (K), and Value (V) matrices in the self-attention mechanism of a transformer?

In the transformer architecture, what is the purpose of residual connections between attention and feed-forward layers?

In the implementation of a bigram language model, what is the main limitation that explains the poor quality of generated texts?

What is the main difference between the self-attention layer used in a decoder and the one in a transformer encoder?

In the Vision Transformer (ViT), how are images processed before being passed into the transformer?

What is the purpose of the 'class token' in the Vision Transformer?

What is the main innovation of the Swin Transformer compared to the Vision Transformer?

What is the advantage of relative position embedding in the Swin Transformer?

What is the training principle of the CLIP model that associates text and image?

Score: 0/10