Back to course
Next page
Interactive Quiz
Test your knowledge!
1
What is Natural Language Processing (NLP) in the context of machine learning?
A
It is a method for processing color images
B
It is a set of text-related tasks such as translation and understanding
C
It is an algorithm for sorting numbers
D
It is only speech recognition
2
In a bigram model for predicting the next character, what is the prediction based on?
A
On the three previous characters
B
On a single previous character
C
On all characters in the text
D
On no previous characters, it is random
3
Why is a special character '.' added at the beginning and end of first names in the bigram model?
A
To increase the size of the dataset
B
To indicate punctuation
C
To model the probability of the first and last letter
D
To replace vowels
4
What is the main problem with the counting method for N-grams when N increases?
A
The number of parameters becomes exponentially large
B
Accuracy automatically decreases
C
Characters become corrupted
D
The model becomes too fast
5
What is the main advantage of using a neural network with a fully connected layer for predicting the next character compared to the counting method?
A
It can handle a variable-sized context without an explosion in model size
B
It requires no training data
C
It always predicts the most frequent letter
D
It uses only a reduced input dimension of 1
6
What does the embedding matrix \( C \) represent in the fully connected model inspired by Bengio et al.?
A
A transformation that encodes characters into a continuous latent space
B
A dictionary of character frequencies
C
A filter to remove rare characters
D
A convolution matrix
7
What is the main advantage of the hyperbolic tangent (tanh) activation function compared to the sigmoid in hidden layers?
A
It always has a positive output
B
It facilitates learning with zero-centered output and larger gradients
C
It is faster to compute
D
It does not require gradient calculation
8
What is the main motivation for using a Recurrent Neural Network (RNN) for character sequence prediction?
A
Not to specify a fixed context size and retain memory of all previous context
B
To enable massive parallelization on GPUs
C
To use only the last character for prediction
D
To reduce the number of parameters to a single weight
9
What is one of the main problems encountered by classical RNNs on long sequences?
A
Gradient explosion problem only
B
Difficulty in propagating information over long sequences (vanishing gradient)
C
Lack of ability to process short sequences
D
They can only process sequences of fixed length
10
What is the main innovation introduced by the LSTM layer compared to a classical RNN?
A
It replaces the softmax function with a sigmoid function
B
It introduces gates (forget, input, output) to manage short-term and long-term memory
C
It uses only convolutions to process the sequence
D
It directly predicts an entire sequence in a single step
Score: 0/10
Score: 0/10