4-bit quantized?
Is there a 4-bit quantized version of this anywhere?
Found one in case anyone needs: https://huggingface.co/anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g . Edit: Nevermind, that model didn't load for me.
Does anyone have a walkthrough for how to use GPTQ to 4-bit quantize the weights? I would like to know how to do this for future model releases.
@disarmyouwitha The GPTQ-for-llama repo contains quantizing instructions
Thanks, not sure how I missed that ;0;
Found one in case anyone needs: https://huggingface.co/anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g . Edit: Nevermind, that model didn't load for me.
I got this one running last night with the latest version of llama.cpp. A little slow (could have also me running a ton of stuff in parallel). Reasonable results: https://twitter.com/ekryski/status/1644275820805103617?s=20.
Definitely promising. Gonna test out some embeddings on it to see how it handles things.
@ekryski can you please tell the pre-requisites before loading the model you are mentioning? I wanted to try it out myself as well in Python but I haven't been able to make sense of all the things. It will be really helpful.