gemma-2-27b-it-exl2 / README.md
mo137's picture
Update README.md
0519e0e verified
|
raw
history blame
1.4 kB
metadata
license: gemma

EXL2 quants of gemma-2-27b-it

My quants are meant to be a tight fit in 24 GB VRAM. Following VRAM usage numbers assume 8k context.

bpw head 4 bit cache 16 bit cache
๐Ÿ‘‰ 5.8 8 bit 21.85 GB 23.69 GB
๐Ÿ‘‰ 6.5 8 bit 23.81 GB 25.65 GB

For this model the difference between 6 bit and 8 bit head is ~300 MB, it's not huge. It could be exchanged for about 0.1 bpw in the body.


Check out turboderp's quants & measurement.json:
3.00 bits per weight
3.50 bits per weight
4.00 bits per weight
4.50 bits per weight
5.00 bits per weight
6.00 bits per weight
8.00 bits per weight

measurement.json