Tokenizer config is wrong
Qwen always uses Qwen2Tokenizer.
Sorry updated the tokenizer class in the first comment. The current tokenizer config states the tokenizer class as LlamaTokenizerFast.
@bartowski sorry if this is something you were already aware of, could this be causing some of the issues on local usage? I checked and it seems all the Qwen-based distills have the same Llama tokenizer class instead of the Qwen one used on the respective base models
It seeeeems unlikely, just since llama.cpp uses its own tokenizer, however it is possible that the existing conversion code was based on an incorrect tokenizer
But that should still not be a problem with the final result I think
I've seen people have better results with lower temperature and proper prompting
@ngxson any thoughts?