About quants
Have you tried quantizing down to the older Q4_0, Q4_1, q5_0s? i recall mixtral 8x7b models having problems with the newer quant methods but working fine with the older types.
It's mentioned in the beginning of this article
https://rentry.org/HowtoMixtral
And Undi mentions it in this model card
https://huggingface.co/Undi95/Toppy-Mix-4x7B-GGUF
Have you tried quantizing down to the older Q4_0, Q4_1, q5_0s? i recall mixtral 8x7b models having problems with the newer quant methods but working fine with the older types.
It's mentioned in the beginning of this article
https://rentry.org/HowtoMixtral
And Undi mentions it in this model card
https://huggingface.co/Undi95/Toppy-Mix-4x7B-GGUF
no I didn't try this, does q4_0 still exist on the new versions of llama.cpp?
I believe it still supports it, i can still make Q8_0 quants successfully. Although i haven't updated it since late january.
It lists all the Q4_0 - Q8_1, Q2_k - Q8_k and IQ quants in here, so i'd guess they're still in the most recent releases?
https://github.com/ggerganov/llama.cpp/blob/master/ggml-quants.h
It lists all the Q4_0 - Q8_1, Q2_k - Q8_k and IQ quants in here, so i'd guess they're still in the most recent releases?
https://github.com/ggerganov/llama.cpp/blob/master/ggml-quants.h
alright I just tried it and it didn't work, this was a great idea though :)
I tried to quant it too, it's cursed. Crashes kobold instantly. Doesn't even give an error
I tried to quant it too, it's cursed. Crashes kobold instantly. Doesn't even give an error
The Cursed Pygmalion-Xwin Mixture