Chat template is broken after tool calling support was added to the template
After this MR was merged https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407/commit/16bd7875cfcddb94d3bd5af433110e9b892b76e4, chat template stopped working for assistant
messages.
A sequence of system
-> user
messages – WORKS
A sequence of a single user
message – WORKS
A sequence of user
-> assistant
-> user
messages – DOESN'T WORK
A sequence of system
-> user
-> assistant
-> user
messages – DOESN'T WORK
The error message is:
After the optional system message, conversation roles must alternate user/assistant/user/assistant/...
It works for me...
Do you have an example message sequence?
I deployed the model with vllm
using openai/v1/chat/completions
endpoint to try it out.
Here is the request payload I'm using:
{
"messages":
[
{
"role": "system",
"content": "You are a usefull assistant. Reply to the given prompts in a concise manner"
},
{
"role": "user",
"content": "Tell me a joke"
},
{
"role": "assistant",
"content": "Why don't bears wear shoes?\n\nBecause they have bear feet!"
},
{
"role": "user",
"content": "Great, tell another one!"
}
],
"model": "mistralai-mistral-nemo-instruct-2407",
"max_tokens": 8096,
"stream": false,
"temperature": 0.3
}
Once I manually replaced tokenizer_config.json
with the content from the 16bd7875cfcddb94d3bd5af433110e9b892b76e4
revision – it worked fine.
Hi there! Interesting, the chat template and tokenizer seems to work well on my end with this repository. The following script works without any problem:
from transformers import AutoTokenizer
hf_tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-Nemo-Instruct-2407")
messages = [
{
"role": "system",
"content": "You are a usefull assistant. Reply to the given prompts in a concise manner"
},
{
"role": "user",
"content": "Tell me a joke"
},
{
"role": "assistant",
"content": "Why don't bears wear shoes?\n\nBecause they have bear feet!"
},
{
"role": "user",
"content": "Great, tell another one!"
}
]
hf_text =hf_tokenizer.apply_chat_template(messages, tokenize=False)
hf_tokens = hf_tokenizer.apply_chat_template(messages, tokenize=True)
print(hf_text)
print(hf_tokens)
Is there a possibility of the error coming from an non-updated transformers or configuration not up to date?
Thank you for looking into this.
I'm a bit lost at this point as I can reproduce your script working in my environment (i.e. with the same version of transformers
library etc.).
At this point, I believe there could be something wrong with the way how either vllm
or kserve
load tokenizer config kserve/kserve
That's the full stacktrace I'm facing. Will keep looking for the root cause of this issue, and will stick to 16bd7875cfcddb94d3bd5af433110e9b892b76e4
revision for now.
File "/kserve/kserve/protocol/rest/openai/endpoints.py", line 123, in create_chat_completion
completion = await self.dataplane.create_chat_completion(
File "/kserve/kserve/protocol/rest/openai/dataplane.py", line 100, in create_chat_completion
return await model.create_chat_completion(completion_request)
File "/kserve/kserve/protocol/rest/openai/openai_chat_adapter_model.py", line 204, in create_chat_completion
chat_prompt = self.apply_chat_template(params.messages)
File "/huggingfaceserver/huggingfaceserver/vllm/vllm_model.py", line 63, in apply_chat_template
prompt=self.openai_serving_completion.apply_chat_template(messages)
File "/huggingfaceserver/huggingfaceserver/vllm/vllm_completions.py", line 356, in apply_chat_template
return self.tokenizer.apply_chat_template(
File "/prod_venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1833, in apply_chat_template
rendered_chat = compiled_template.render(
File "/prod_venv/lib/python3.10/site-packages/jinja2/environment.py", line 1304, in render
self.environment.handle_exception()
File "/prod_venv/lib/python3.10/site-packages/jinja2/environment.py", line 939, in handle_exception
raise rewrite_traceback_stack(source=source)
File "<template>", line 14, in top-level template code
File "/prod_venv/lib/python3.10/site-packages/jinja2/sandbox.py", line 394, in call
return __context.call(__obj, *args, **kwargs)
File "/prod_venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1914, in raise_exception
raise TemplateError(message)
jinja2.exceptions.TemplateError: After the optional system message, conversation roles must alternate user/assistant/user/assistant/...
Looks like your problem stems from kserve
, it might have pinned an older transformers
that sets the template environment in a way that muddles the rendering.
The version of transformers
used by kserve (in the pre-released version which I'm currently testing) is 4.43.3
(ref), which is also confirmed by me via pip freeze.
I suspect that somehow, there is such a combination of passed arguments to compiled_template.render(...)
, that breaks the rendering process of the latest Nemo teplate. By "passed arguments" I mean tools, documents and template_kwargs
Hi, I have the exact same problem. Also using vllm (in version 0.6.3) and it fails on a system -> user -> assistant -> user sequence with Nemo-Instruct-2407 where the user-assistant subsequence in the middle represents a tool call example.
Is there already a workaround?