Chat template
Hello
What is the chat template? Presuming it's ChatML since you added the ChatML tokens?
Would you mind documenting this in the model card?
adding this to the tokenizer config would help
"chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
So it is confirmed that is 100% ChatML?
instruction_template: |-
{%- for message in messages %}
{%- if message['role'] == 'system' -%}
{{- '<|im_start|>system\n' + message['content'] + '<|im_end|>\n' -}}
{%- else -%}
{%- if message['role'] == 'user' -%}
{{-' <|im_start|>user\n' + message['content'] + '<|im_end|>\n'-}}
{%- else -%}
{{-' <|im_start|>assistant\n' + message['content'] + '<|im_end|>\n' -}}
{%- endif -%}
{%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt -%}
{{-' <|im_start|>assistant\n'-}}
{%- endif -%}
The chat template is in the tokenizer file: https://huggingface.co/databricks/dbrx-instruct/blob/17365204e9cf13e2296ee984c1ab48071e861efa/tiktoken.py#L198-L221
Hi
@MaziyarPanahi
and
@ehartford
, we just update the tokenizer to remove the tiktoken
dependency and use the standard GPT2Tokenizer
. Could you try it out and let us know if you see any issues?
Hello
@MaziyarPanahi
Could you guide me on how I can leverage the above chatML for fine tuning, I want to fine tune DBRX using LongLora and I have the dataset in ChatML format
Hello @MaziyarPanahi
Could you guide me on how I can leverage the above chatML for fine tuning, I want to fine tune DBRX using LongLora and I have the dataset in ChatML format
Which library/framework are you using for the fine-tuning? Most of them are coming with great dataset/prompt handling like HuggingFace alignment-handbook or Axolotl. They take care of it for you.
I'm trying to use llm foundary for it, in that there isn't a parameter or way to set it
Sorry, I've never used llm foundary, but it should be straightforward using Axolotl. A similar config for it:
base_model: /workspace/axolotl/dbrx-checkpoint
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
trust_remote_code: true
load_in_8bit: false
# load_in_4bit: true
strict: false
# adapter: qlora
# lora_modules_to_save: [embed_tokens, lm_head]
# lora_r: 32
# lora_alpha: 16
# lora_dropout: 0.05
# lora_target_linear: false
# lora_fan_in_fan_out:
datasets:
- path: /workspace/datasets/dolphin-2.9/dolphin201-sharegpt2.jsonl
type: sharegpt
conversation: chatml
# - path: /workspace/datasets/dolphin-2.9/Ultrachat200kunfiltered.jsonl
# type: sharegpt
# conversation: chatml
- path: /workspace/datasets/dolphin-2.9/dolphin-coder-translate-sharegpt2.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9/dolphin-coder-codegen-sharegpt2.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9/m-a-p_Code-Feedback-sharegpt-unfiltered.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9/m-a-p_CodeFeedback-Filtered-Instruction-sharegpt-unfiltered.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9/not_samantha_norefusals.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9/Orca-Math-resort-unfiltered.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9/agent_instruct_react_unfiltered.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9/toolbench_instruct_j1s1_3k_unfiltered.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9/toolbench_negative_unfiltered.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9/toolbench_react_10p_unfiltered.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9/toolbench_tflan_cot_30p_unfiltered.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9/openhermes200k_unfiltered.jsonl
type: sharegpt
conversation: chatml
# - path: /workspace/datasets/dolphin-2.9/SystemConversations.jsonl
# type: sharegpt
# conversation: chatml
chat_template: chatml
dataset_prepared_path: dbrx2
val_set_size: 0.01
output_dir: ./out
sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true
wandb_project: dolphin-2.9-Dbrx
wandb_watch:
wandb_run_id:
wandb_log_model:
gradient_accumulation_steps: 8
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 1e-5
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: true
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
early_stopping_patience:
# resume_from_checkpoint: /workspace/axolotl/dbrx-checkpoint
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
saves_per_epoch: 4
save_total_limit: 2
save_steps:
debug:
deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16_cpuoffload_params.json
weight_decay: 0.05
fsdp:
fsdp_config:
special_tokens:
bos_token: "<|endoftext|>"
eos_token: "<|im_end|>"
pad_token: "<|pad|>"
unk_token: "<|endoftext|>"
tokens:
- "<|im_start|>"
- "<|im_end|>"
credit: dolphin-2.9