Interactive Quiz

What is the main effect of initializing weights too large in a network with the tanh activation function?

Which simple method helps reduce an abnormally high loss at the initialization of a network?

Why should the weights of a neural network not be initialized to zero?

What is the derivative of the tanh function, and why is it important for the problem of dead neurons?

Which modern technique is recommended to ensure good weight initialization in deep networks?

Score: 0/5