30B 3bit seems pretty sweet by the official evaluation

by Yhyu13 - opened Apr 28, 2023

Apr 28, 2023

Thanks for making 3bit 128G quantized model for the community!

The 30B 3bit 128G model seems to meet a sweet spot that outperform 13B fp16 model:

Which motivated me to just about convert the model by myself.

I was very pleasent with locally deploying 13B fp16 models on a dual 3090 server. Now I am going to try two instances of 30B 3bit model

tsumeone

Owner Apr 28, 2023

•

edited Apr 28, 2023

Glad you are finding use for it! This one scores Wikitext2 at 5.22. I tried publishing model card with my eval results, but HF is having problems.

Here are the results I got (better to just compare to the results on my other two supercot quants, since then it's apples to apples)

WikiText2: 5.22 (12% worse than 4bit non-groupsize)
PTB: 19.63 (11% worse than 4bit non-groupsize)
C4: 6.93 (7% worse than 4bit non-groupsize)

Onix22

May 3, 2023

It is hard to tell the difference but looks like those results are reasonably accurate . 3bit version writes a bit shorter responses but at least now it fits more text into memory before crashing.

Still hard to decide if it is better to use 4bit model which is a little bit better or smaller model which has more context space.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment