Replete-AI/Replete-LLM-Qwen2-7b · Change eos to chat control token of the assistant

Aug 13

•

Please look into the documentation of transformers library about how a template for chat is created: https://huggingface.co/docs/transformers/main/en/chat_templating#how-do-i-create-a-chat-template

So while the <|endoftext|> is necessary to train the model and maybe also necessary for the qwen2-7b base model template, it is not strictly necessary when doing inference on the model with a chat template, such as yours. For inference, it is sufficient to use <|im_end|>. That is, because <|im_end|> emerges BEFORE the <|endoftext|> in the token generation. Many GUI frontends have one or a combination of up to three features that hide special tokens from the user: (1) they make special tokens invisible in the GUI or (2) have a custom stop token implementation that lets users (manually) choose the end of sequence / chat control token or (3) have some kind of magic autodetection that is intelligent enough to detect the assistant's chat control tokens (a chat template parser). GPT4All 3.2.0 does not have the first two (there are some disadvantages to them and I do think to go into detail about this is not necessary) and its magic parsing apparently expects the eos to be set to the chat control token of the assistant, instead of the eos of the whole generation.

Before:

With eos set to <|endoftext|> in the tokenizer_config.json:

After:

With eos set to <|im_end|> in the tokenizer_config.json:

Summary:

For inference, (what is declared in the chat template as eos and bos in the tokenizer_config.json) > (eos and bos in config.json) > (bos and eos in generate_config.json)

At least this is the case, when quantized to GGUF with llama.cpp).
You can test by setting null.

Notes:

Setting bos in tokenizer_config.json to either <|endoftext|> (which is similar to null in this case) or conversely to <|im_start|> WILL change its responses. I will leave null for now.

Change eos to chat control token of the assistantc1d9cea6

rombodawg changed pull request status to merged Aug 13

ThiloteE

Aug 13

For anybody who really wants to do some deep dive into related issues, discussions, commits and code that came up during my research today, see also:

https://github.com/huggingface/transformers/issues/26862
https://github.com/huggingface/transformers/pull/31301
https://github.com/huggingface/transformers/pull/29459
https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig.eos_token_id