Dynamic bnb-4bit

#1
by iqdddd - opened

If I may ask, is there a plan to publish code for quantization in Dynamic 4-bit, or to perhaps offer it for sale?

Unsloth AI org

If I may ask, is there a plan to publish code for quantization in Dynamic 4-bit, or to perhaps offer it for sale?

Each model requires different quantization. We open-sourced the dynamic quants for our R1-GGUF repo here: https://github.com/unslothai/llama.cpp

@shimmyshimmer If I may ask, why did you (unsloth) name it "dynamic quant"? What's dynamic about it?
What I can see is that the quants are customized per layer and tensor type.

Also, I didn't find explicit mention of the superweight handling. Is that done implicit in the imatrix computation?

Sign up or log in to comment