mistral-community/pixtral-12b · Is the chat template correct? (issue for vLLM)

Jan 5

If I prompt the model with a text and an image, it produces the [IMG] without a newline \n. However the correct version seems to be with \n.
Also I tried deployment with vLLM and passing this chat template. It does not produce an error, but creates gibberish results.

Thanks in advance!

DarkLight1337

Jan 9

•

edited Jan 9

I found another typo in the chat template:

          {%- if message["content"] is not string %}
              {%- for chunk in message["content"] %}
                  {%- if chunk["type"] == "text" %}
-                     {{- chunk["content"] }}
+                     {{- chunk["text"] }}
                  {%- elif chunk["type"] == "image" %}
                      {{- "[IMG]" }}
                  {%- else %}
                      {{- raise_exception("Unrecognized content type!") }}
                  {%- endif %}
              {%- endfor %}
          {%- else %}
              {{- message["content"] }}
          {%- endif %}

To be compatible with OpenAI schema, the inner key should be text, not content.

DarkLight1337

Jan 9

•

edited Jan 9

This modified template fixes the issue for me:

{%- if messages[0]["role"] == "system" %}
    {%- set system_message = messages[0]["content"] %}
    {%- set loop_messages = messages[1:] %}
{%- else %}
    {%- set loop_messages = messages %}
{%- endif %}

{{- bos_token }}
{%- for message in loop_messages %}
    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}
        {{- raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}
    {%- endif %}
    {%- if message["role"] == "user" %}
        {%- if loop.last and system_message is defined %}
            {{- "[INST]" + system_message + "\n" }}
        {%- else %}
            {{- "[INST]" }}
        {%- endif %}
        {%- if message["content"] is not string %}
            {%- for chunk in message["content"] %}
                {%- if chunk["type"] == "text" %}
                    {{- chunk["text"] }}
                {%- elif chunk["type"] == "image" %}
                    {{- "[IMG]" }}
                {%- else %}
                    {{- raise_exception("Unrecognized content type!") }}
                {%- endif %}
            {%- endfor %}
        {%- else %}
            {{- message["content"] }}
        {%- endif %}
        {{- "[/INST]" }}
    {%- elif message["role"] == "assistant" %}
        {{- message["content"] + eos_token}}
    {%- else %}
        {{- raise_exception("Only user and assistant roles are supported, with the exception of an initial optional system message!") }}
    {%- endif %}
{%- endfor %}

RaushanTurganbay

Unofficial Mistral Community org Jan 21

•

edited Jan 21

Hey, I will update the template. Indeed it is not aligned with how most VLMs work, thanks for letting know.

UPD: Done, the template now accepts text and content as valid keys

ktrapeznikov

Jan 21

I think there is still a bug in the template (at least when using vLLM to server)

So something like this works:

messages=[        {
            "role": "user",
            "content": [
                {"type": "text", "text": "describe the image"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": image_url,
                    },
                },
                
            ],
        }
    ],

adding an extra turn:

messages=[        {
            "role": "user",
            "content": [
                {"type": "text", "text": "describe the image"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": image_url,
                    },
                },
                {"role":"assistant", "content":"image depicts something"},
                {"role":"user","content":"check again"}
            ],
        }
    ],

I get an error when apply the chat template in the transformer code.

ile "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 371, in create_chat_completion
    generator = await handler.create_chat_completion(request, raw_request)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 165, in create_chat_completion
    ) = await self._preprocess_chat(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 479, in _preprocess_chat
    request_prompt = apply_hf_chat_template(
                     ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 988, in apply_hf_chat_template
    return tokenizer.apply_chat_template(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py", line 1683, in apply_chat_template
    rendered_chat = compiled_template.render(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/jinja2/environment.py", line 1295, in render
    self.environment.handle_exception()
  File "/usr/local/lib/python3.12/dist-packages/jinja2/environment.py", line 942, in handle_exception
    raise rewrite_traceback_stack(source=source)
  File "<template>", line 34, in top-level template code
TypeError: can only concatenate list (not "str") to list

Using latest version vLLM v0.6.6.post1

RaushanTurganbay

Unofficial Mistral Community org Jan 21

•

edited Jan 23

The second conversation is not correctly constructed. If you want to apply the HF template, each turn should be written separately as below

messages = 
    [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "describe the image"},
                {
                    "type": "image",
                    "image_url": {"url": image_url},
                }
            ],
        },
        {
            "role": "assistant",
            "content": [
                {"type": "text", "text": "image depicts something"},
            ],
        },
        {
            "role": "user",
            "content":[
                {"type": "text", "text": "check again"},
            ]
        },
]

ktrapeznikov

Jan 23

@RaushanTurganbay my bad, I pasted the code incorrectly in the comment but I did try the correct conversion format and I get the same error with vLLM.

I tried just the processor from transformers and it works if I change the following:

messages = [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "describe the image"},
                {
                    "type": "image",
                }
            ],
        },
        {
            "role": "assistant",
            "content": "test",
        },
        {
            "role": "user",
            "content":[
                {"type": "text", "text": "check again"},
            ]
        },
]
model_id = "mistral-community/pixtral-12b"
processor = AutoProcessor.from_pretrained(model_id)
processor.apply_chat_template(chat)

>>  '<s>[INST]describe the image[IMG][/INST]test</s>[INST]check again[/INST]'.

It does give the same error as vLLM if I keep the assistant message as nested:

{
            "role": "assistant",
            "content": [
                {"type": "text", "text": "image depicts something"},
            ],
        },

RaushanTurganbay

Unofficial Mistral Community org Jan 23

Sorry, my bad, didn't notice that the template had extra if-nesting for each role. Should be okay now with alternating roles