README.md · mo137/gemma-2-27b-it-exl2 at 0519e0efaa00a6ec17b6c19815ea1495cb2db9c1

metadata

license: gemma

EXL2 quants of gemma-2-27b-it

My quants are meant to be a tight fit in 24 GB VRAM. Following VRAM usage numbers assume 8k context.

bpw	head	4 bit cache	16 bit cache
👉 5.8	8 bit	21.85 GB	23.69 GB
👉 6.5	8 bit	23.81 GB	25.65 GB

For this model the difference between 6 bit and 8 bit head is ~300 MB, it's not huge. It could be exchanged for about 0.1 bpw in the body.