Thanks, this takes more VRAM, is a 3.5bpw possible?
#1
by
async0x42
- opened
Unfortunately even w/ a 4090/4080 (40gb vram) I can only fit in 13056 ctx at 4bpw w/ Q4 cache compared to 32K context w/ the 70b models using exllama 0.1.5. I'm downloading it now to try to do it but I've still been getting the same no-log exists as the other attempts for other models.
Edit: Actually, if you're able to include the measurements json in the repo then that would work too!