mo137
/

gemma-2-27b-it-exl2

Model card Files Files and versions Community

gemma-2-27b-it-exl2 / README.md

mo137's picture

Update README.md

0519e0e verified 7 months ago

|

1.4 kB

	---
	license: gemma
	---
	EXL2 quants of [gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it)

	My quants are meant to be a tight fit in 24 GB VRAM. Following VRAM usage numbers assume 8k context.

	bpw\|head\|4 bit cache\|16 bit cache
	--:\|--:\|--:\|--:
	👉 [5.8](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/5.8bpw_h8)\|8 bit\|21.85 GB\|23.69 GB
	👉 [6.5](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/6.5bpw_h8)\|8 bit\|23.81 GB\|25.65 GB

	For this model the difference between 6 bit and 8 bit head is ~300 MB, it's not huge. It could be exchanged for about 0.1 bpw in the body.

	---
	Check out turboderp's quants & `measurement.json`:
	[3.00 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/3.0bpw)
	[3.50 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/3.5bpw)
	[4.00 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/4.0bpw)
	[4.50 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/4.5bpw)
	[5.00 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/5.0bpw)
	[6.00 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/6.0bpw)
	[8.00 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/8.0bpw)

	[measurement.json](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/blob/main/measurement.json)