Back to course
Next page
Interactive Quiz
Test your knowledge!
1
What is the main effect of initializing weights too large in a network with the tanh activation function?
A
Neurons have activations close to zero, facilitating gradient propagation.
B
Neurons have activations close to +1 or -1, which causes a very small gradient.
C
The loss is lower at initialization, which speeds up training.
D
Biases become useless and can be initialized to random values.
2
Which simple method helps reduce an abnormally high loss at the initialization of a network?
A
Initialize all weights to zero.
B
Multiply the weights by a small value (e.g., 0.01) and initialize biases to zero.
C
Use a sigmoid activation function instead of tanh.
D
Increase the batch size to stabilize the loss.
3
Why should the weights of a neural network not be initialized to zero?
A
Because it leads to a uniform distribution of activations.
B
Because it prevents differentiation between neurons, blocking learning.
C
Because it increases the initial loss.
D
Because it speeds up training too much and causes overfitting.
4
What is the derivative of the tanh function, and why is it important for the problem of dead neurons?
A
tanh'(t) = 1 - tanh(t), meaning the gradient is always high.
B
tanh'(t) = 1 - tanh(t)^2, and if t is close to ±1, the gradient is very small.
C
tanh'(t) = t, which prevents gradient propagation.
D
tanh'(t) = exp(-t), which causes gradient explosion.
5
Which modern technique is recommended to ensure good weight initialization in deep networks?
A
Uniform random initialization between -1 and 1, without bias.
B
Kaiming (He) initialization adapted to the activation function used.
C
Initialize weights to zero and biases to 0.5.
D
Exclusively use batch normalization (batch norm) without worrying about initialization.
Score: 0/5
Score: 0/5