Please make i1 quants of my latest 72b model
I really appreciate your work. I just released my best model yet. I would love it if you could make some i1 quants of it.
Sure! it's queued, would be a shame if we didn't have quants for it :)
You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#Rombo-LLM-V3.0-Qwen-72b-GGUF for quants to appear.
@rombodawg hi! i would like to know why we would use i1 quants at all... like - the general consensis i think is that Q4 is pretty good, and anything below is kinda not worth it anymore, where just getting a smaller model with a Q4 makes more sense performance-wise.
i have heard of those 1.58 bit quants, which apparently perform surprisingly well, but im assuming those are not the same...
would you mind explaining it to me? I am very curious :)
i1-IQ4_XS is better quality then Q4_K_M at 20% smaller size. for example. and i1-IQ1_S quants are ~1.58bpw, and perform "surprisingly good" (but much worse than Q4_K).
@mradermacher oooh so i1 quant is not just a "one bit quant"? but some interesting inbetween? that is super cool! :o
i is that some new fancy quant method i completely missed? or has it just - always been around
(also, please @me ,otherwise i don't get the notification)
@Smorty100 No, it's just our naming convention for imatrix quants.
@mradermacher
This model looks like a banger, can we get i1 quants of it please? Its uncensored r1 distill 70b. No china censorship
https://huggingface.co/perplexity-ai/r1-1776-distill-llama-70b
Queued, on our newest, experimental, quant node, too. Keep your fingers crossed that it works.
You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#r1-1776-distill-llama-70b-GGUF for quants to appear.