Qwen 1.5 series pls
@bartowski Thanks @turboderp for the great work. Now exllamav2 supports the Qwen1.5 model. Please help me quantify it, thank you. In addition, considering that Qwen1.5 uses a lot of Chinese and English training. When quantizing, using a mixture of Chinese and English calibration data sets may reduce the loss of the model's multilingual capabilities during the quantification process.
It is worth noting that Qwen overemphasizes model safety during the alignment process, which may cause the model to incorrectly refuse to answer some normal questions.
Here is a recommended Chinese dataset that can alleviate the problems caused by over-alignment, and it is recommended that you use it for calibration during quantification. https://huggingface.co/datasets/tastypear/unalignment-toxic-dpo-v0.2-zh_cn/tree/main
I've successfully quantized Qwen1.5-72B-chat, so I don't think there should be any issues converting more Qwen models. I will do some more versions of the 72B chat model and upload at some point, I believe.
As for calibration data, the default calibration dataset contains a fair amount of Chinese (along with many other languages), so it should be well suited for multilingual models like Qwen.
@Pevernow 7b up, 14b in the works
@bartowski @turboderp Thanks a lot.
Unfortunately, the Qwen1.5 model is not friendly to consumer-grade PCs due to the lack of GQA support and the large memory usage.
Also, would you be interested in making a quantized version of exl2 for https://huggingface.co/CausalLM/14B-DPO-alpha?
I noticed that this was the model from December last year, but I couldn't find the relevant exl2 quantization model. The only relevant one is the non-DPO version, which is of poorer quality.
Can you please quantify one? Thank you.
yeah the lack of GQA is rough.
I'll make that 14B DPO alpha once my current one is done
no i think it needs tokenizer.model or tokenizer.json
According to the official introduction, this model is fully compatible with Llama2 and also uses the Llama2 architecture. Tokenizer maybe GPT2Tokenizer, omitted because it is very common?
https://huggingface.co/cgus/CausalLM-14B-exl2/tree/main
In addition, I found the non-DPO version of exl quantization in this link, with a tokenizer.model. Although the original page does not provide this file.
There is another possible guess, maybe if you run it with transformers, the dependency file will be automatically generated or downloaded?
no i think it needs tokenizer.model or tokenizer.json
@bartowski Any updates?