Confirming higher HumanEval for Evol instruct 5bit quant vs. Wikitext based one.
Finally managed to run HumanEval for 5bits and the different calibration presets you created:
Exllama supports greedy decoding by specifying tok_k = 1; which I did, there is very slight randomness still but it's a character or two.
The benefit is really significant - in Transformers 8 bit(!!!) I cannot cross 70% with this model. 4bit runs 68%
Your Evol quant is also much better than the ones I tried to recreate, the best I got is 70.7% while yours is very close to the official fp16 reported numbers (73% and btw. the community has a hard time recreating this result)
Congrats! Would you also share some details on what makes your quant special? I..e exact file for calibration and commit of Exllamav2 if possible?
Wiki
Base
{'pass@1': 0.6890243902439024}
Base + Extra
{'pass@1': 0.6524390243902439}
Evol
{'pass@1': 0.725609756097561}
Base + Extra
{'pass@1': 0.6707317073170732}
REDO
Base
{'pass@1': 0.7195121951219512}
Base + Extra
{'pass@1': 0.6707317073170732}
@KrisPi
Calibration dataset for evol quant is linked in the README, it's wizardLM-evol-instruct_70k
. As for the commit hash of Exllama2 -- unfortunately I can't tell exactly, I've updated exllama2 many times since then. All I can say is that I was using the latest exllama2 at the moment, and it was about a month ago. Interesting findings about the evol quant, I did not think calibration dataset would matter much. Guess I'll make some more 4bpw quants using evol-instruct for calibration.
@KrisPi I made new quants using megacode dataset, they seem to be much better. https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2