Upload Un-quanted model
#1
by
deleted
- opened
Sure, I'll upload it tonight (need my bandwidth for now). I've added a Q3_K_M for you though for now.
LMK what you think of the model.
Is the tokenizer the same... I am using the HF samplers. Might cause a problem with the chat format if the tokens are different.
yea.. using the same configs from gemma doesn't work. it will output chatml into the chat because the IDs are different. so at least upload the jsons. Using pure llama.cpp loader in TGW it's more broken. The replies are good though.
If it’s not too difficult, you can also post a version like a (i1) Q4_K_S. Just something a bit smaller thetn 4qm (Doesn't fit in 24 vram with 16k context or even need something like IQ4_XS.)