Heya!
Is this GGUF version of Kunoichi-7B notably different than TheBloke's? Or you just wanted to have your own files for wherever reason?
Just curious! π
Hi
@Spacellary
It's just done with the latest llama.cpp release from last week. I wouldn't say it's different, but I am using the main
branch from llama.cpp and I like the models to be compatible with the latest for my serving.
I see! Are there any expected improvements from recent additions to master
branch to be gained?
I see! Are there any expected improvements from recent additions to
master
branch to be gained?
They do lots of optimizations and release rapidly, not sure if it's directly connected to the convert to GGUF or quantizations. But I know they improved 2-bit quite a lot in the last few weeks, I am pretty happy with some of the 2-bit I made from some merges. But I mostly do it to be sure the latest llama.cpp can server these GGUF models when it's used for API serving