Back to course
Next page
Interactive Quiz
Test your knowledge!
1
What is the main advantage of transfer learning in deep learning?
A
It allows freely modifying the architecture of the pre-trained model.
B
It allows training a model from scratch more quickly.
C
It speeds up training and improves performance by reusing an already trained model.
D
It always requires more data than traditional training.
2
What is the main difference between transfer learning and fine-tuning?
A
Transfer learning trains a new model without using a pre-trained model; fine-tuning uses a pre-trained model.
B
Fine-tuning involves retraining only certain layers of a pre-trained model, while transfer learning can retrain all or part of the model.
C
Fine-tuning modifies the model's architecture; transfer learning does not.
D
Transfer learning can only be used on identical tasks; fine-tuning can be used on different tasks.
3
In fine-tuning, how do you choose the number of layers to retrain?
A
Always retrain all layers for better performance.
B
The fewer data you have, the more layers you retrain.
C
The more similar the tasks, the fewer layers you retrain.
D
The number of retrained layers has no influence.
4
Which dataset is often used to pre-train image classification models in transfer learning?
A
MNIST
B
CIFAR-10
C
ImageNet
D
COCO
5
What is the main objective of knowledge distillation?
A
Increase the model size to improve accuracy.
B
Transfer knowledge from a high-performing model (teacher) to a smaller model (student).
C
Train a model without using labels.
D
Reduce the number of layers in a deep network.
6
Why does knowledge distillation often improve the performance of the student model?
A
Because the student uses only labels and not the teacher's predictions.
B
Because the student learns a more informative probability distribution than labels alone.
C
Because the student is trained without a loss function.
D
Because the teacher is smaller than the student.
7
In knowledge distillation applied to unsupervised anomaly detection, what is the main role of the student model?
A
Directly predict the class of images.
B
Learn to reproduce the internal representations (feature maps) of the teacher model on defect-free data to detect anomalies by difference.
C
Generate synthetic data for training.
D
Remain frozen (untrained) throughout the process.
8
What is the particularity of the BERT architecture compared to GPT?
A
BERT is a unidirectional transformer; GPT is bidirectional.
B
BERT is based on the transformer encoder block and is bidirectional; GPT uses the decoder block and is unidirectional.
C
BERT cannot be fine-tuned; GPT can.
D
BERT uses only positional embeddings.
9
Which training task does BERT use to learn linguistic representations?
A
Next word prediction only.
B
Masked language modeling (predicting masked words) and next sentence prediction.
C
Machine translation.
D
Image classification.
10
In token-level classification with BERT (e.g., NER), why is a [CLS] token used at the beginning of the sequence?
A
To indicate the end of the sequence.
B
To extract a global representation useful for sentence-level classification.
C
To mask tokens.
D
To replace all tokens with a single one.
Score: 0/10
Score: 0/10