Low quality IQ quants

#1
by MoonRide - opened

@RichardErkhov Hi there.

When creating IQ quants (IQ4_XS, IQ3_XS) it's essential to first create imatrix, and then use it during quantization - otherwise the output quant is lower quality (like the current IQ quants in this repo).

You can find the instructions how to properly do IQ quants (by bartowski) here: https://huggingface.co/bartowski/Phi-3.5-mini-instruct-GGUF/discussions/3#66d22e7aa94e473e2ced1758

And pretty good calibration dataset for creating imatrix here: https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8

It's also a good habit to include f16 weights (best for creating imatrix) in the GGUF repos, as well as the .imatrix file (it's useful if someone wanted to produce less comonly used IQ quants from the original f16 GGUF).

Hi, thank you for suggestion. Yeah, I should include fp16. For inatrix: I dont have enough gpu power for, so unless someone sponsors me, I am unable to provide that

MoonRide changed discussion title from Extremely low quality IQ quants to Low quality IQ quants

Sign up or log in to comment