is it ggufable?
#1
by
sopack
- opened
since most us mortals don't have huge VRAMs, it'd be cool to gguf this model as well.
Yes, here is a converted model: https://huggingface.co/Demonthos/dolphin-2_6-phi-2-candle/blob/main/model-q4k.gguf
And the code to run the model: https://github.com/floneum/floneum/pull/120/files#diff-3397acf5a72f28f207293cb878d25773c9c3f5ab4c4e1fc88eec1a9e9857e033
I like the idea of quip# because it would be so tiny that it would work cpu even without ggml, but from what I remember from the paper is that the 2 bit quantization works less well for smaller models. Might still be beter than gptq 3 bit but I think we get more performance with 5-6 bit gguf version.
Note that the bloke just published a gguf version :)
ehartford
changed discussion status to
closed