Prompt Template?

#1
by fakezeta - opened

Based on https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat/prompts/prompt.py I see that v1 uses ChatML while v2 an Alpaca style prompt.

What is the correct Prompt Template for v3?

I'd also like to know

If even the mighty TheBloke cannot sort it out I'm relieved :-)

I'd also like to know

@TheBloke FYI - I've tested most of the templates from the link provided above (i.e. github) and the v2 template appears to work quite well. The other templates generate bad responses.

### System:
{system_prompt}

### User:
{user_prompt}

### Assistant:
 

@dranger003 the template is correct, Thank you!

I did some tests. It feels to me that the model prefers the following format (without the new lines):

### System:
$SYSTEM</s>
### User: $PROMPT0
### Assistant: $REPLY0</s>
### User: $PROMPT1
### Assistant:

Sigh.. I'd wish people would stick to a consistent prompt format, at least when sharing on hugging face. It makes it so much more tedious to test out and compare different LLMs when they don't normalize the prompting template.

Intel org

Sorry for being late. It has been updated in model card.

Sigh.. I'd wish people would stick to a consistent prompt format, at least when sharing on hugging face. It makes it so much more tedious to test out and compare different LLMs when they don't normalize the prompting template.

Huggingface has been trying to alleviate this issue with the chat_template.
https://huggingface.co/docs/transformers/chat_templating
The creator can submit the format there and the rest of us can just call that and have the input ready to go by just providing it with our raw input strings.

I tried running this model with transformers and I see this warning.

No chat template is defined for this tokenizer - using the default template for the LlamaTokenizerFast class. If the default is not appropriate for your model, please set tokenizer.chat_template to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.

It'd be great if Intel can leverage the new chat_templating functionality by HF that makes it easy and accurate for all downstream users.

Thank you for your work!

@Haihao I'm curious, is there a particular reason for selecting the chat template format?

More and more LLMs are using the following standard ChatML template, like so:

<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>

With dedicated special tokens for <|im_start|> and <|im_end|>.

Unfortunately I can't provide real data... but I can say that from subjective experience, models with the ChatML template seem to behave better as chat agents, probably because every turn has a clear token to designate the beginning/end. I'd be great if future neural-chat models use the same ChatML template!

@andysalerno hi, good points and I agree your idea. Actually, we used the special tokens <|im_start|> and <|im_end|> to train mpt and llama2 model. So In this model version iterations, we will consider using the special tokens <|im_start|> and <|im_end|>

ืชืคื•ื— ืขื ืฉื™ื ื™ื™ื ืžืคืœืฆืชื™ื•ืช ืžื ืกื” ืœื˜ืจื•ืฃ ืื“ื ืžื‘ื•ื’ืจ

Sign up or log in to comment