Broken
Seems to be broken, unfortunately. I tried a gguf my repo quant https://huggingface.co/lemon07r/llama-3-SNAMD-8B-Q8_0-GGUF and that didnt work, so I downloaded the original repo and quantized it myself using the latest version of lcpp, etc. Also did not work. Tried two different versions of kcpp to load it, vulkan, openblas, hipblas too, all of them just open and close before I get to see the error.
http://5.9.86.149/hf/llama-3-SNAMD-8B/
You can try this, I had the big blender bot on the koboldai discord server do the same merge using your yaml config. Should work.
http://5.9.86.149/hf/llama-3-SNAMD-8B/
You can try this, I had the big blender bot on the koboldai discord server do the same merge using your yaml config. Should work.
Nope.. This is broken too. Same issue. Tried to quant it myself. Any idea why it doesnt work?
I will try quanting it tomorrow (I broke my linux workstation, whoops) and share the config if it works. The safetensors work for me using Transformers on a GPU.
@lemon07r
Could be a problem with the tokenizer? Something like that happened when some users tried to quant Stheno-Mahou. Lewdiculous made it work in that thread.
https://huggingface.co/nbeerbower/llama-3-Stheno-Mahou-8B/discussions/1#66577ab60c9058052fd84ffe
Hi, I finished quantizing this model here: https://huggingface.co/emnakamura/llama-3-SNAMD-8B-GGUF
You need to use llama.cpp's convert-hf-to-gguf.py
script. We use --outtype f32
to minimize data loss when converting to GGUF.
Hi, I finished quantizing this model here: https://huggingface.co/emnakamura/llama-3-SNAMD-8B-GGUF
You need to use llama.cpp's
convert-hf-to-gguf.py
script. We use--outtype f32
to minimize data loss when converting to GGUF.
That's exactly how I did it though, latest version of lcpp on fedora 40, converted to f32 first. With both your weights, and my own weights (from same recipe). Not sure why the quant didnt work. I'll give yours an attempt.
EDIT do you have a q8_0 for me to test with? cause thats the quant that didnt work for me.