Back to course
Next page
Interactive Quiz
Test your knowledge!
1
What is the main function of a tokenizer in the context of language models (LLM)?
A
Convert text into a sequence of integers (tokens)
B
Translate text from one language to another
C
Generate responses from a given context
D
Evaluate the performance of a language model
2
Why do LLMs often struggle with simple string operations, such as reversing a word?
A
Because LLMs do not understand the syntax of words
B
Because words are tokenized into chunks of characters rather than character by character
C
Because language models do not process strings
D
Because the LLM vocabulary is too small to contain all words
3
What is the main advantage of the byte-pair encoding (BPE) algorithm in building a tokenizer?
A
It reduces the vocabulary size to fewer than 100 tokens
B
It allows increasing the vocabulary size while reducing the length of tokenized sequences
C
It automatically translates texts into English
D
It replaces each word with a single Unicode character
4
Why does the GPT-2 tokenizer make the model less effective for processing Python code?
A
Because Python uses special characters that GPT-2 does not recognize
B
Because each indentation space is counted as a separate token, rapidly increasing the context size
C
Because GPT-2 does not understand programming language syntax
D
Because GPT-2 cannot tokenize numbers correctly
5
What is the main impact of a tokenizer trained primarily on English data on an LLM’s performance in other languages?
A
The model uses more tokens to express the same sentence in other languages, limiting the effective context size
B
The model automatically translates foreign texts into English before processing
C
The tokenizer removes all non-English characters
D
The model performs better in Japanese than in English
Score: 0/5
Score: 0/5