Change eos to chat control token of the assistant
Please look into the documentation of transformers library about how a template for chat is created: https://huggingface.co/docs/transformers/main/en/chat_templating#how-do-i-create-a-chat-template
So while the <|endoftext|>
is necessary to train the model and maybe also necessary for the qwen2-7b base model template, it is not strictly necessary when doing inference on the model with a chat template, such as yours. For inference, it is sufficient to use <|im_end|>
. That is, because <|im_end|>
emerges BEFORE the <|endoftext|>
in the token generation. Many GUI frontends have one or a combination of up to three features that hide special tokens from the user: (1) they make special tokens invisible in the GUI or (2) have a custom stop token implementation that lets users (manually) choose the end of sequence / chat control token or (3) have some kind of magic autodetection that is intelligent enough to detect the assistant's chat control tokens (a chat template parser). GPT4All 3.2.0 does not have the first two (there are some disadvantages to them and I do think to go into detail about this is not necessary) and its magic parsing apparently expects the eos to be set to the chat control token of the assistant, instead of the eos of the whole generation.
Before:
With eos set to <|endoftext|>
in the tokenizer_config.json:
After:
With eos set to <|im_end|>
in the tokenizer_config.json:
Summary:
For inference, (what is declared in the chat template as eos and bos in the tokenizer_config.json
) > (eos and bos in config.json
) > (bos and eos in generate_config.json
)
At least this is the case, when quantized to GGUF with llama.cpp).
You can test by setting null
.
Notes:
Setting bos in tokenizer_config.json to either <|endoftext|>
(which is similar to null
in this case) or conversely to <|im_start|>
WILL change its responses. I will leave null
for now.
For anybody who really wants to do some deep dive into related issues, discussions, commits and code that came up during my research today, see also:
https://github.com/huggingface/transformers/issues/26862
https://github.com/huggingface/transformers/pull/31301
https://github.com/huggingface/transformers/pull/29459
https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig.eos_token_id