README.md · mo137/gemma-2-27b-it-exl2 at main

metadata

license: gemma

EXL2 quants of gemma-2-27b-it

My quants are meant to be a tight fit in 24 GB VRAM. Following VRAM usage numbers assume 8k context.

bpw	head	4 bit cache	16 bit cache	Notes
5.8	8 bit	21.85 GB	23.69 GB	16 bit cache, but lower BPW
👉 6.5	8 bit	23.81 GB	25.65 GB	👈 my recommendation
6.6	6 bit	23.86 GB	25.70 GB	slightly higher BPW, but less precise head

For this model the difference between 6 bit and 8 bit head is ~300 MB, it's not huge. It could be exchanged for about 0.1 bpw in the body.