Dynamic bnb-4bit
#1
by
iqdddd
- opened
If I may ask, is there a plan to publish code for quantization in Dynamic 4-bit, or to perhaps offer it for sale?
If I may ask, is there a plan to publish code for quantization in Dynamic 4-bit, or to perhaps offer it for sale?
Each model requires different quantization. We open-sourced the dynamic quants for our R1-GGUF repo here: https://github.com/unslothai/llama.cpp
@shimmyshimmer
If I may ask, why did you (unsloth) name it "dynamic quant"? What's dynamic about it?
What I can see is that the quants are customized per layer and tensor type.
Also, I didn't find explicit mention of the superweight handling. Is that done implicit in the imatrix computation?