4-bit quantized?

by nacs - opened Apr 4, 2023

Discussion

nacs

Apr 4, 2023

Is there a 4-bit quantized version of this anywhere?

nacs

Apr 4, 2023

•

edited Apr 4, 2023

Found one in case anyone needs: https://huggingface.co/anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g . Edit: Nevermind, that model didn't load for me.

disarmyouwitha

Apr 5, 2023

Does anyone have a walkthrough for how to use GPTQ to 4-bit quantize the weights? I would like to know how to do this for future model releases.

nacs

Apr 5, 2023

@disarmyouwitha The GPTQ-for-llama repo contains quantizing instructions

disarmyouwitha

Apr 5, 2023

Thanks, not sure how I missed that ;0;

ekryski

Apr 7, 2023

Found one in case anyone needs: https://huggingface.co/anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g . Edit: Nevermind, that model didn't load for me.

I got this one running last night with the latest version of llama.cpp. A little slow (could have also me running a ton of stuff in parallel). Reasonable results: https://twitter.com/ekryski/status/1644275820805103617?s=20.

Definitely promising. Gonna test out some embeddings on it to see how it handles things.

JackReacher23

Apr 26, 2023

@ekryski can you please tell the pre-requisites before loading the model you are mentioning? I wanted to try it out myself as well in Python but I haven't been able to make sense of all the things. It will be really helpful.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment