30B 3bit seems pretty sweet by the official evaluation
Thanks for making 3bit 128G quantized model for the community!
The 30B 3bit 128G model seems to meet a sweet spot that outperform 13B fp16 model:
Which motivated me to just about convert the model by myself.
I was very pleasent with locally deploying 13B fp16 models on a dual 3090 server. Now I am going to try two instances of 30B 3bit model
Glad you are finding use for it! This one scores Wikitext2 at 5.22. I tried publishing model card with my eval results, but HF is having problems.
Here are the results I got (better to just compare to the results on my other two supercot quants, since then it's apples to apples)
WikiText2: 5.22 (12% worse than 4bit non-groupsize)
PTB: 19.63 (11% worse than 4bit non-groupsize)
C4: 6.93 (7% worse than 4bit non-groupsize)
It is hard to tell the difference but looks like those results are reasonably accurate . 3bit version writes a bit shorter responses but at least now it fits more text into memory before crashing.
Still hard to decide if it is better to use 4bit model which is a little bit better or smaller model which has more context space.