Chat prompt format?

#3
by mike-ravkine - opened

Based on reverse engineering https://github.com/facebookresearch/llama/blob/main/llama/generation.py#L212 here's what I've come up with:

<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

 [/INST]</s>
<s>[INST] {{prompt}} [/INST]

It seems to work but holy hell is this 'chat' finetune lobotomized - it will take any excuse it can find to refuse any and all requests.

I have also run some testing with the system prompt changed to:

You are a helpful, respectful and honest assistant. Always answer as helpfully as possible.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.

This helps quite a bit with the refusals, instead of a flat out "this is unethical" it will complain a bit and then answer (what it imagines to be) a safer version of your question.

That is similar to my conclusion about the format, but as far as my understanding of the code goes the system message is attached to the first prompt, rather than standing on it's own. Meaning the first [INST] block contains both the system message and the first instruction. Then all subsequent [INST] blocks contain only the instruction.

That also matches with the understanding that asgeir and jxy independently came up with which suggest it is probably correct.

Either way the current prompt template displayed in the model card is definitively wrong. @TheBloke it would be nice if you could replace it quickly since there will be a lot of people trying out these models right now.

GIST: https://gist.github.com/the-crypt-keeper/8d781a12ee515903edc89ef69383570f

Result for a single <prompt> input:

<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

<prompt> [/INST]

Result for a <prompt> then assistant says <answer> then user follows up with <prompt-second>:

<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

<prompt> [/INST] <answer> </s><s>[INST] <prompt-second> [/INST]

@Mikael110 thanks.

mike-ravkine changed discussion status to closed

@mike-ravkine You're confident that there's no </s> if it's only a single prompt, is that right?

Do you know a good prompt (including SYS) in order to generate a concise/rephrased summary that captures the main points of a text based on?: https://huggingface.co/meta-llama/Llama-2-13b-chat-hf

I would like to replicate the results I get from the demo: https://huggingface.co/spaces/huggingface-projects/llama-2-13b-chat

There I only type "Summarize the following text: {text}" and it provides a great summary wile the same prompt with the template from above is not satisfying at all.

Do you know a good prompt (including SYS) in order to generate a concise/rephrased summary that captures the main points of a text based on?: https://huggingface.co/meta-llama/Llama-2-13b-chat-hf

I would like to replicate the results I get from the demo: https://huggingface.co/spaces/huggingface-projects/llama-2-13b-chat

There I only type "Summarize the following text: {text}" and it provides a great summary wile the same prompt with the template from above is not satisfying at all.

Can you provide an example with screenshots of both prompts and results?

They're using almost exactly the same prompt (though seemingly without the opening <s>): https://huggingface.co/spaces/huggingface-projects/llama-2-13b-chat/blob/main/model.py

@clayp Yes the EOS is only involved when the conversation goes multi-turn and the leading BOS doesn't seem to make any difference I get the same output.

@Sven00 See https://huggingface.co/spaces/huggingface-projects/llama-2-13b-chat/blob/main/model.py#L24 for the exact function they use to assemble the prompt. It matches my PR in #4 except for the BOS token that's appended at the beginning in their original repo but not in this app.

I'm pretty sure llama.cpp's tokenizer won't currently tokenize and to BOS/EOS, you'd need to preprocess and poke in the tokens yourself.

@ycros I'm trying to handle that here.

def convert_to_llama_format(messages):
''' Function to convert a message or an array of user and System prompts to llama format.'''
if isinstance(messages, list) == False:
messages = f'[INST] <> <>\n\n {messages} [/INST]'
return(messages)

formatted_messages = "[INST] <<SYS>> <</SYS>>"
for message in messages:
    role = message["role"]
    content = message["content"]
    
    if role == "system":
        formatted_messages = f"[INST]<<SYS>>\n{content}\n<</SYS>>"
    if role == "user":
        formatted_messages = formatted_messages + f"\n\n{content} [/INST]"
    if role == "assistant":
        formatted_messages = formatted_messages + f"\n\n{content} [INST]"

return(formatted_messages)

Input ->
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Knock knock."},
{"role": "assistant", "content": "Who's there?"},
{"role": "user", "content": "Orange."},
]

Output ->
[INST]<>
You are a helpful assistant.
<>

Knock knock. [/INST]

Who's there? [INST]

Orange. [/INST]

This comment has been hidden

Sign up or log in to comment