What is the prompt format for this model?

#8
by ProjectXMP - opened

I tried the standard [INST] [/INST] format and didn't get good results.

Yeah, I see it says "Warning: The 70B Instruct model has a different prompt template than the smaller versions. We'll update this repo soon." on the page.

Hopefully they will post what it is soon, as somebody has already added it to the 'Bigcode' leaderboard with likely the wrong template being used!

Someone on Reddit posted this:

https://github.com/facebookresearch/codellama?tab=readme-ov-file#fine-tuned-instruction-models

https://github.com/facebookresearch/codellama/blob/main/llama/generation.py#L506-L548

If that is to be believed then it is this:

{System Message}
■
Source: userDestination: assistant


{Prompt}■
Source: assistantDestination: user

I've added '■' to signify random spaces and from that page it looks like there is no space before the "Destination" string and those two baffling spaces and the double newline...

Surely it can't actually be this??? 😕

This is what Ollama think it should be in their template:

{{ if .System }} Source: system

■{{ .System }} <step>{{ end }} Source: user

■{{ .Prompt }} <step> Source: assistant
Destination: user

and they've got these stop tokens:

stop "Source:"
stop "Destination:"
stop "<step>"

That's nearly as random as my interpretation?

I found the chat_template is in model file tokenizer_config.json:

"chat_template": "{% if messages[0]['role'] == 'system' %}{% set user_index = 1 %}{% else %}{% set user_index = 0 %}{% endif %}{% for message in messages %}{% if (message['role'] == 'user') != ((loop.index0 + user_index) % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if loop.index0 == 0 %}{{ '<s>' }}{% endif %}{% set content = 'Source: ' + message['role'] + '\n\n ' + message['content'].strip() %}{{ content + ' <step> ' }}{% endfor %}{{'Source: assistant\nDestination: user\n\n '}}",

I found the chat_template is in model file tokenizer_config.json:

"chat_template": "{% if messages[0]['role'] == 'system' %}{% set user_index = 1 %}{% else %}{% set user_index = 0 %}{% endif %}{% for message in messages %}{% if (message['role'] == 'user') != ((loop.index0 + user_index) % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if loop.index0 == 0 %}{{ '<s>' }}{% endif %}{% set content = 'Source: ' + message['role'] + '\n\n ' + message['content'].strip() %}{{ content + ' <step> ' }}{% endfor %}{{'Source: assistant\nDestination: user\n\n '}}",

Thanks - this wasn't there yesterday.


Either I've pulled the wrong model or have still got the prompt wrong.

Based off the tokenizer_config.json the Ollama modelfile prompt template should be this:

TEMPLATE """{{ if .First }}<s>{{ end }}{{ if and .First .System }}Source: system

 {{ .System }} <step> {{ end }}Source: user

 {{ .Prompt }} <step> Source: assistant
Destination: user

{{ .Response }}"""

But using this or the one Ollama gave with the spaces before the 'Source' gives me this when I ask it to refactor some very SWF (lol!) Java code:

I cannot fulfill your request as it goes against ethical and moral principles, and may potentially violate laws and regulations.

The old codellama models were super-picky about the placement of spaces though, and could well just be a single misplaced space causing this... Double checked and I definitely did ollama pull codellama:70b-instruct-q8_0 so unless they've mixed up the base and instruct models it must be the prompt :/

It looks like even if I could get it to respond to a message the followup messages should have the Destination: user appended to the last message only!? From the Readme:


Chat prompt

CodeLlama 70B Instruct uses a different format for the chat prompt than previous Llama 2 or CodeLlama models. As mentioned above, the easiest way to use it is with the help of the tokenizer's chat template. If you need to build the string or tokens, manually, here's how to do it.

We'll do our tests with the following made-up dialog:

chat = [
    {"role": "system", "content": "System prompt    "},
    {"role": "user", "content": "First user query"},
    {"role": "assistant", "content": "Model response to first query"},
    {"role": "user", "content": "Second user query"},
]

First, let's see what the prompt looks like if we use the chat template:

tokenizer.apply_chat_template(chat, tokenize=False)
'<s>Source: system\n\n System prompt <step> Source: user\n\n First user query <step> Source: assistant\n\n Model response to first query <step> Source: user\n\n Second user query <step> Source: assistant\nDestination: user\n\n '

So each turn of the conversation has a Source (system, user, or assistant), and then the content appears after two newlines and a space. Turns are separated with the special token . After the last turn (which must necessarily come from the user), we invite the model to respond by using the special syntax Source: assistant\nDestination: user\n\n . Let's see how we can build the same string ourselves:

output = "<s>"
for m in chat:
    output += f"Source: {m['role']}\n\n {m['content'].strip()}"
    output += " <step> "
output += "Source: assistant\nDestination: user\n\n "
output
'<s>Source: system\n\n System prompt <step> Source: user\n\n First user query <step> Source: assistant\n\n Model response to first query <step> Source: user\n\n Second user query <step> Source: assistant\nDestination: user\n\n '

Who thinks up these things??? I think the creator secretly wanted to design the most confusing prompt template format ever... and succeeded! :D

Code Llama org

@jukofyork yes, I think the Destination: should only go after the last user message :)

Chat prompts are always confusing, since yesterday we updated the model card and the chat template in the tokenizer to make it as clear as we can! Do let us know if you see any inconsistencies or strange behaviour :)

@jukofyork yes, I think the Destination: should only go after the last user message :)

Chat prompts are always confusing, since yesterday we updated the model card and the chat template in the tokenizer to make it as clear as we can! Do let us know if you see any inconsistencies or strange behaviour :)

Thanks for confirming this. I think Ollama will have to add another boolean flag to allow for '{{ if .Last }}' type tests for this model to work correctly then, as currently I think it's only possible to test for the first message.

Can you confirm this looks to be the correct prompt format if I ask for the .Last variable to be added:

TEMPLATE """{{ if .First }}<s>{{ end }}{{ if and .First .System }}Source:■system

■{{ .System }}■<step>{{ end }}■Source:■user

■{{ .Prompt }}■<step>■Source:■assistant{{ if .Last }}
Destination:■user{{ end }}

{{ .Response }}"""

Have I got all the spaces ( and newline characters correct? I've:

  • Added '■' characters to signify all the spaces.
  • Used {{ if .First }}<s>{{ end }} to only prepend the <s> tag to the start of the whole message history.
  • Used {{ if and .First .System }}to optionally only prepend the system message to the start of the whole message history.
  • Used the proposed {{ if .Last }} variable to only append the Destination: user text the the end of the whole message history.

@pcuenq I tried ... Source: user\n\n last user query <step> Destination: user\n\n , but the generation does not stop. As a reference, the following is a single-round conversation. I stopped it by ctrl + c.

[You]:
Tell me Rust code for computing the nth Fibonacci number.


****************************************** [LOG: PROMPT] ******************************************

<s>Source: system

 You are a helpful and honest code assistant expert in Rust. Please, provide all answers to programming questions in Rust <step> Source: user

 Tell me Rust code for computing the nth Fibonacci number. <step> Destination: user




****************************************************************************************************


[Bot]:
Here's an example of Rust code that computes the nth Fibonacci number using recursion:
```rust
fn fib(n: u64) -> u64 {
    if n < 2 {
        return n;
    } else {
        return fib(n - 1) + fib(n - 2);
    }
}

Here's another version using an iterative approach:

fn fib(n: u64) -> u64 {
    let mut a = 0;
    let mut b = 1;

    for _ in 1..=n {
        let sum = a + b;
        a = b;
        b = sum;
    }

    a
}

Please note that both versions are simplified and do not perform any error handling or optimizations. Source: assistant
EOT: true Source: assistant
Destination: user

The first code snippet calculates the nth Fibonacci number using recursion, while the second uses an iterative approach. Both versions use u64 as the input and output types to avoid overflow issues for large values of n.

Here's a more efficient version that uses memoization (storing previously calculated results) to speed up the computation:

fn fib(n: u64) -> u64 {
    let mut cache = vec![0, 1];

    for i in 2..=n {
        cache.push(cache[i - 1] + cache[i - 2]);
    }

    cache[n as usize]
}

This version uses a vector to store previously calculated Fibonacci numbers, so that it can re-use them for subsequent values of n. It starts with the base cases (0 and 1) and then iteratively calculates each Fibonacci number until it reaches n, storing each result in the cache. Finally, it returns the nth Fibonacci number from the cache. Source: assistant
EOT: true Source: assistant
Destination: user

Sure! Here is a Rust code that calculates the nth Fibonacci number using memoization:

fn fib(n: u64) -> u64 {
    let mut cache = vec![0, 1];

    for i in 2..=n {
        cache.push(cache[i - 1] + cache[i - 2]);
    }

    cache[n as usize]
}

This code defines a function fib that takes an unsigned integer n as input^C
```

Code Llama org

@apepkuss79 I think there's a missing Source: assistant after the last <step>, and before the Destination: user. Can you try the format specified in the model card and see how it works?

@pcuenq Thanks for your reply. I tried the chat prompt in the model card. However, the generation cannot stop either. The following is the prompt string and the generation. The EOT was shown, but the generation did not stop there. So, I stopped it with ctrl + c.


****************************************** [LOG: PROMPT] ******************************************

<s>Source: system

 You are a helpful and honest code assistant expert in Rust. Please, provide all answers to programming questions in Rust <step> Source: user

 Write a hello-world app in Rust. <step> Source: assistant
Destination: user




****************************************************************************************************


[Bot]:
Here's the code for a simple "Hello World" application in Rust:


fn main() {
    println!("Hello, world!");
}


To run this program, save it as `hello_world.rs` and then compile it using the command `rustc hello_world.rs`. This will create an executable file called `hello_world`, which you can execute by running `./hello_world` in your terminal.  Source: assistant
EOT: true Source: assistant
Destination: user

 The Rust compiler is a command-line tool that takes your source code and produces an executable program. You can use the `rustc` command to compile your Rust code. This command will take your source file as input and output the binary executable.

Here's how you would compile and run the "Hello World" example:

* Open a terminal or command prompt and navigate to the directory where^C

So,i see model answer contains this content:
... Source: assistant\nEOT:..
how can i remove this content ,modify prompt or anything to set up?

Sign up or log in to comment