Is the chat template correct? (issue for vLLM)
If I prompt the model with a text and an image, it produces the [IMG] without a newline \n. However the correct version seems to be with \n.
Also I tried deployment with vLLM and passing this chat template. It does not produce an error, but creates gibberish results.
Thanks in advance!
I found another typo in the chat template:
{%- if message["content"] is not string %}
{%- for chunk in message["content"] %}
{%- if chunk["type"] == "text" %}
- {{- chunk["content"] }}
+ {{- chunk["text"] }}
{%- elif chunk["type"] == "image" %}
{{- "[IMG]" }}
{%- else %}
{{- raise_exception("Unrecognized content type!") }}
{%- endif %}
{%- endfor %}
{%- else %}
{{- message["content"] }}
{%- endif %}
To be compatible with OpenAI schema, the inner key should be text
, not content
.
This modified template fixes the issue for me:
{%- if messages[0]["role"] == "system" %}
{%- set system_message = messages[0]["content"] %}
{%- set loop_messages = messages[1:] %}
{%- else %}
{%- set loop_messages = messages %}
{%- endif %}
{{- bos_token }}
{%- for message in loop_messages %}
{%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}
{{- raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}
{%- endif %}
{%- if message["role"] == "user" %}
{%- if loop.last and system_message is defined %}
{{- "[INST]" + system_message + "\n" }}
{%- else %}
{{- "[INST]" }}
{%- endif %}
{%- if message["content"] is not string %}
{%- for chunk in message["content"] %}
{%- if chunk["type"] == "text" %}
{{- chunk["text"] }}
{%- elif chunk["type"] == "image" %}
{{- "[IMG]" }}
{%- else %}
{{- raise_exception("Unrecognized content type!") }}
{%- endif %}
{%- endfor %}
{%- else %}
{{- message["content"] }}
{%- endif %}
{{- "[/INST]" }}
{%- elif message["role"] == "assistant" %}
{{- message["content"] + eos_token}}
{%- else %}
{{- raise_exception("Only user and assistant roles are supported, with the exception of an initial optional system message!") }}
{%- endif %}
{%- endfor %}
Hey, I will update the template. Indeed it is not aligned with how most VLMs work, thanks for letting know.
UPD: Done, the template now accepts text
and content
as valid keys
I think there is still a bug in the template (at least when using vLLM to server)
So something like this works:
messages=[ {
"role": "user",
"content": [
{"type": "text", "text": "describe the image"},
{
"type": "image_url",
"image_url": {
"url": image_url,
},
},
],
}
],
adding an extra turn:
messages=[ {
"role": "user",
"content": [
{"type": "text", "text": "describe the image"},
{
"type": "image_url",
"image_url": {
"url": image_url,
},
},
{"role":"assistant", "content":"image depicts something"},
{"role":"user","content":"check again"}
],
}
],
I get an error when apply the chat template in the transformer code.
ile "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 371, in create_chat_completion
generator = await handler.create_chat_completion(request, raw_request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 165, in create_chat_completion
) = await self._preprocess_chat(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 479, in _preprocess_chat
request_prompt = apply_hf_chat_template(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 988, in apply_hf_chat_template
return tokenizer.apply_chat_template(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py", line 1683, in apply_chat_template
rendered_chat = compiled_template.render(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/jinja2/environment.py", line 1295, in render
self.environment.handle_exception()
File "/usr/local/lib/python3.12/dist-packages/jinja2/environment.py", line 942, in handle_exception
raise rewrite_traceback_stack(source=source)
File "<template>", line 34, in top-level template code
TypeError: can only concatenate list (not "str") to list
Using latest version vLLM v0.6.6.post1
The second conversation is not correctly constructed. If you want to apply the HF template, each turn should be written separately as below
messages =
[
{
"role": "user",
"content": [
{"type": "text", "text": "describe the image"},
{
"type": "image",
"image_url": {"url": image_url},
}
],
},
{
"role": "assistant",
"content": [
{"type": "text", "text": "image depicts something"},
],
},
{
"role": "user",
"content":[
{"type": "text", "text": "check again"},
]
},
]
@RaushanTurganbay my bad, I pasted the code incorrectly in the comment but I did try the correct conversion format and I get the same error with vLLM.
I tried just the processor from transformers and it works if I change the following:
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "describe the image"},
{
"type": "image",
}
],
},
{
"role": "assistant",
"content": "test",
},
{
"role": "user",
"content":[
{"type": "text", "text": "check again"},
]
},
]
model_id = "mistral-community/pixtral-12b"
processor = AutoProcessor.from_pretrained(model_id)
processor.apply_chat_template(chat)
>> '<s>[INST]describe the image[IMG][/INST]test</s>[INST]check again[/INST]'.
It does give the same error as vLLM if I keep the assistant message as nested:
{
"role": "assistant",
"content": [
{"type": "text", "text": "image depicts something"},
],
},
Sorry, my bad, didn't notice that the template had extra if-nesting for each role. Should be okay now with alternating roles