Quantitative command?
Hi, I've been waiting for a long time to try the new quantized version of exl2-2(The presentation on reddit says that 2.4bpw is better than 4.5bpw/Q4KS) , and decided to try it on my own not long ago, but the same 3bpw+pippa calibration dataset test is very confusing (3.8 for your 4.5bpw, 8.17 for the 3bpw, and 13.4 for my 3bpw quantized results)
I used "python convert.py -i /path -o /out_path -c pippa_raw_fix.parquet -b 3 -hb 6"
Can you share your quantization command please? Do I need any other parameters to improve the quality?
I use a pretty similar command, but I just don't specify -hb (are you going for headsize in this one?
py .\convert.py -i 'Goliath-120B-path' -o 'Goliath-120B-3bpw-path' -c pippa_raw_fix.parquet -b 3
Well I try it twice with and with out -hb (default is 6 if not set)
-hb 3 give 13.8
-hb 6 give 13.4
Maybe just because new quant metlod not good for big model like 100~120B?
I'll try it again with older version exllamav2 to use old metlod to figure out
Thanks for the share
Well, it looks like 3bpw rpcal it's just simply randomly crash with no reason
2.9bpw exl2-2 rpcal 5.13
2.4bpw exl2-2 5.96
2.4bpw exl2-2 rpcal 6.68
3bpw exl2-2 rpcal 13.8
I'm getting 11~15 t/s with 2.9bpw+sd on 2x3090, feels good
Pretty interesting results, but as you keep testing you will notice PPL is not everything, but that you like the model itself.
Glad quanting worked for you.
haha I know that, I'm not try to get best score, just want to know why the scores is diffirent
And yes I'm very agree with you, PPL is not everything, and benchmark also, I like use new model few hours to feel it but not test the score
I've uploaded new quants in any case, if you want to try. I also suggest if you do, to backup your existing quants.