mo137
/

gemma-2-27b-it-exl2

Model card Files Files and versions Community

mo137 commited on Jul 16, 2024

Commit

0519e0e

·

verified ·

1 Parent(s): ffc118a

Update README.md

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -3,14 +3,14 @@ license: gemma
 ---
 EXL2 quants of [gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it)
-My quants are meant to be a tight fit in 24 GB VRAM.
-- [**5.8** bpw & **8** bpw head](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/5.8bpw_h8)
-should use **21.85 GB VRAM** with 4 bit cache or **23.69 GB** with 16 bit cache
-- [**6.5** bpw & **8** bpw head](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/6.5bpw_h8)
-should use **23.81 GB VRAM** with 4 bit cache
-The difference between 6 bit and 8 bit head is ~300 MB, it's not huge. It could be exchanged for about 0.1 bpw in the body, so 6.6bpw_h6 should use similar VRAM to 6.5bpw_h8.
 ---
 Check out turboderp's quants & `measurement.json`:

 ---
 EXL2 quants of [gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it)
+My quants are meant to be a tight fit in 24 GB VRAM. Following VRAM usage numbers assume **8k context**.
+bpw|head|4 bit cache|16 bit cache
+--:|--:|--:|--:
+👉 [**5.8**](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/5.8bpw_h8)|8 bit|21.85 GB|23.69 GB
+👉 [**6.5**](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/6.5bpw_h8)|8 bit|23.81 GB|25.65 GB
+For this model the difference between 6 bit and 8 bit head is ~300 MB, it's not huge. It could be exchanged for about 0.1 bpw in the body.
 ---
 Check out turboderp's quants & `measurement.json`: