metadata
license: gemma
EXL2 quants of gemma-2-27b-it
My quants are meant to be a tight fit in 24 GB VRAM. Following VRAM usage numbers assume 8k context.
For this model the difference between 6 bit and 8 bit head is ~300 MB, it's not huge. It could be exchanged for about 0.1 bpw in the body.
Check out turboderp's quants & measurement.json
:
3.00 bits per weight
3.50 bits per weight
4.00 bits per weight
4.50 bits per weight
5.00 bits per weight
6.00 bits per weight
8.00 bits per weight