Can you make a 4.65 bpw?
a 4.65 can fit perfectly on a 4090 @ 10240 ctx without 8-bit cache. Just a thought. Thanks!
Sure I'll take a look when I have cycles available
Thanks! I'd say you can replace 4.25 with 4.65 from now on, as the target audience is the same - people with 24GB of VRAM.
Why is that? 4.25 allows for 16k context, would think people would want that one as well
Finally started it so should be up soon, will ping when it's there
Why is that? 4.25 allows for 16k context, would think people would want that one as well
Finally started it so should be up soon, will ping when it's there
Because a lot of times the finetuned of the base model cannot handle more than 8k. On jondurbin/airoboros-34b-3.2, it says:
"This is using yi-34b-200k as the base model. While the base model supports 200k context size, this model was fine-tuned with a ctx size of 8k tokens, so anything beyond that will likely have questionable results."
And thank you for the new quant! Downloading now.