4-bit quantized?

#7
by nacs - opened

Is there a 4-bit quantized version of this anywhere?

Found one in case anyone needs: https://huggingface.co/anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g . Edit: Nevermind, that model didn't load for me.

Does anyone have a walkthrough for how to use GPTQ to 4-bit quantize the weights? I would like to know how to do this for future model releases.

@disarmyouwitha The GPTQ-for-llama repo contains quantizing instructions

Thanks, not sure how I missed that ;0;

Found one in case anyone needs: https://huggingface.co/anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g . Edit: Nevermind, that model didn't load for me.

I got this one running last night with the latest version of llama.cpp. A little slow (could have also me running a ton of stuff in parallel). Reasonable results: https://twitter.com/ekryski/status/1644275820805103617?s=20.

Definitely promising. Gonna test out some embeddings on it to see how it handles things.

@ekryski can you please tell the pre-requisites before loading the model you are mentioning? I wanted to try it out myself as well in Python but I haven't been able to make sense of all the things. It will be really helpful.

Sign up or log in to comment