Question about chat vector merging

#1
by qutrino - opened

Thank you for releasing such an excellent model.

I noticed that the qwen2.5-bakeneko-32b-instruct model states that "the embedding layer was omitted when performing the subtraction and addition of parameter vectors," but this note is absent in deepseek-r1-distill-qwen2.5-bakeneko-32b.

Could you please clarify which approach was used for this model? If a different method was applied, a brief explanation of the rationale would be greatly appreciated.

Thank you for your help.

rinna Co., Ltd. org

Thx for your comment! Just as mentioned in the model card, we merged ALL layers when developing deepseek-r1-distill-qwen2.5-bakeneko-32b.
The DeepSeek-R1 series uses LlamaTokenizer, while the Qwen2.5 series uses QwenTokenizer. Our merging approach was chosen to better accommodate this tokenizer transition.

Sign up or log in to comment