Change eos to chat control token of the assistant

Please look into the documentation of transformers library about how a template for chat is created: https://huggingface.co/docs/transformers/main/en/chat_templating#how-do-i-create-a-chat-template

![image.png](https://cdn-uploads.huggingface.co/production/uploads/654bf4de09dd7ef52485f49f/ygk-i_5m-Qb4vZSf1lBIT.png)

So while the `<|endoftext|>` is necessary to train the model and maybe also necessary for the qwen2-7b base model template, it is not strictly necessary when doing inference on the model with a chat template, such as yours. For inference, it is sufficient to use `<|im_end|>`. That is, because `<|im_end|>` emerges BEFORE the `<|endoftext|>` in the token generation. Many GUI frontends have one or a combination of up to three features that hide special tokens from the user: (1) they make special tokens invisible in the GUI or (2) have a custom stop token implementation that lets users (manually) choose the end of sequence / chat control token or (3) have some kind of magic autodetection that is intelligent enough to detect the assistant's chat control tokens (a chat template parser). GPT4All 3.2.0 does not have the first two (there are some disadvantages to them and I do think to go into detail about this is not necessary) and its magic parsing apparently expects the eos to be set to the chat control token of the assistant, instead of the eos of the whole generation.

### Before:
With eos set to `<|endoftext|>` in the tokenizer_config.json:
![Replete-eos-test3.jpg](https://cdn-uploads.huggingface.co/production/uploads/654bf4de09dd7ef52485f49f/647iVSkStQ1r7ZmfeNXCP.jpeg)

### After:
With eos set to `<|im_end|>` in the tokenizer_config.json:
![Replete-eos-test4-tokenizer_config-trumps-all.jpg](https://cdn-uploads.huggingface.co/production/uploads/654bf4de09dd7ef52485f49f/n5k8CDWXDt55eiegoK_PZ.jpeg)

I have done a lot of tests today and all I can say that

### Summary:
For inference, (what is declared in the chat template as eos and bos in the `tokenizer_config.json`) **>** (eos and bos in `config.json`) **>** (bos and eos in `generate_config.json`)

At least this is the case, when quantized to GGUF with llama.cpp).
You can test by setting `null`.

### Notes:
Setting bos in tokenizer_config.json to either `<|endoftext|>` (which is similar to `null` in this case) or conversely to `<|im_start|>` WILL change its responses. I will leave `null` for now.

Files changed (1) hide show

tokenizer_config.json +1 -1

tokenizer_config.json CHANGED Viewed

@@ -30,7 +30,7 @@
   "bos_token": null,
   "chat_template": "{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system\nYou are a helpful assistant<|im_end|>\n' }}{% endif %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}{{ '<|endoftext|>' }}",
   "clean_up_tokenization_spaces": false,
-  "eos_token": "<|endoftext|>",
   "errors": "replace",
   "model_max_length": 32768,
   "pad_token": "<|endoftext|>",

   "bos_token": null,
   "chat_template": "{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system\nYou are a helpful assistant<|im_end|>\n' }}{% endif %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}{{ '<|endoftext|>' }}",
   "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
   "errors": "replace",
   "model_max_length": 32768,
   "pad_token": "<|endoftext|>",