Uploaded GGUF and exl2 as Phi 3.1
The change in performance is so huge you really are doing yourselves a disservice by not renaming it! It may get swept under the rug because people will assume you just updated the README
I've uploaded GGUF and EXL2 here as Phi 3.1:
https://huggingface.co/bartowski/Phi-3.1-mini-4k-instruct-GGUF
https://huggingface.co/bartowski/Phi-3.1-mini-4k-instruct-exl2
Looks like they bumped the mini-128k too.
yeah sadly 128k still isn't supported in llama.cpp :(
NotImplementedError: The rope scaling type longrope is not supported yet
it's possible you could create them but it would just be the same as the 4k model in practice
Thought this had been sorted... https://github.com/ggerganov/llama.cpp/pull/7225
see I thought it had too, thank you for finding that.. looking at the changelog they may have changed it to a new rope method :') it used to be a regular rope with short factor and long factor, now it's their new longrope...
All the more important to distinguish between the versions, then.