Update README.md
Browse files
README.md
CHANGED
@@ -3,14 +3,14 @@ license: gemma
|
|
3 |
---
|
4 |
EXL2 quants of [gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it)
|
5 |
|
6 |
-
My quants are meant to be a tight fit in 24 GB VRAM.
|
7 |
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
|
13 |
-
|
14 |
|
15 |
---
|
16 |
Check out turboderp's quants & `measurement.json`:
|
|
|
3 |
---
|
4 |
EXL2 quants of [gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it)
|
5 |
|
6 |
+
My quants are meant to be a tight fit in 24 GB VRAM. Following VRAM usage numbers assume **8k context**.
|
7 |
|
8 |
+
bpw|head|4 bit cache|16 bit cache
|
9 |
+
--:|--:|--:|--:
|
10 |
+
👉 [**5.8**](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/5.8bpw_h8)|8 bit|21.85 GB|23.69 GB
|
11 |
+
👉 [**6.5**](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/6.5bpw_h8)|8 bit|23.81 GB|25.65 GB
|
12 |
|
13 |
+
For this model the difference between 6 bit and 8 bit head is ~300 MB, it's not huge. It could be exchanged for about 0.1 bpw in the body.
|
14 |
|
15 |
---
|
16 |
Check out turboderp's quants & `measurement.json`:
|