mo137
/

gemma-2-27b-it-exl2

Model card Files Files and versions Community

gemma-2-27b-it-exl2 / README.md

mo137's picture

Update README.md

7c54806 verified 7 months ago

|

history blame contribute delete

1.61 kB

	---
	license: gemma
	---
	EXL2 quants of [gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it)

	My quants are meant to be a tight fit in 24 GB VRAM. Following VRAM usage numbers assume 8k context.

	bpw\|head\|4 bit cache\|16 bit cache\|Notes
	--:\|--:\|--:\|--:\|:--
	[5.8](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/5.8_h8)\|8 bit\|21.85 GB\|23.69 GB\|16 bit cache, but lower BPW
	👉 [6.5](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/6.5_h8)\|8 bit\|23.81 GB\|25.65 GB\|👈 my recommendation
	[6.6](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/6.6_h6)\|6 bit\|23.86 GB\|25.70 GB\|slightly higher BPW, but less precise head

	For this model the difference between 6 bit and 8 bit head is ~300 MB, it's not huge. It could be exchanged for about 0.1 bpw in the body.

	---
	Check out turboderp's quants & measurement.json:
	[3.00 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/3.0bpw)
	[3.50 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/3.5bpw)
	[4.00 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/4.0bpw)
	[4.50 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/4.5bpw)
	[5.00 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/5.0bpw)
	[6.00 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/6.0bpw)
	[8.00 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/8.0bpw)

	[measurement.json](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/blob/main/measurement.json)