gemma-2-27b-it-exl2 / README.md
mo137's picture
Update README.md
7c54806 verified
metadata
license: gemma

EXL2 quants of gemma-2-27b-it

My quants are meant to be a tight fit in 24 GB VRAM. Following VRAM usage numbers assume 8k context.

bpw head 4 bit cache 16 bit cache Notes
5.8 8 bit 21.85 GB 23.69 GB 16 bit cache, but lower BPW
๐Ÿ‘‰ 6.5 8 bit 23.81 GB 25.65 GB ๐Ÿ‘ˆ my recommendation
6.6 6 bit 23.86 GB 25.70 GB slightly higher BPW, but less precise head

For this model the difference between 6 bit and 8 bit head is ~300 MB, it's not huge. It could be exchanged for about 0.1 bpw in the body.


Check out turboderp's quants & measurement.json:
3.00 bits per weight
3.50 bits per weight
4.00 bits per weight
4.50 bits per weight
5.00 bits per weight
6.00 bits per weight
8.00 bits per weight

measurement.json