mo137
/

gemma-2-27b-it-exl2

Model card Files Files and versions Community

mo137 commited on Jul 16, 2024

Commit

ffc118a

·

verified ·

1 Parent(s): ae98656

Update README.md

Files changed (1) hide show

README.md +25 -3

README.md CHANGED Viewed

@@ -1,3 +1,25 @@
----
-license: gemma
----

+---
+license: gemma
+---
+EXL2 quants of [gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it)
+My quants are meant to be a tight fit in 24 GB VRAM.
+- [**5.8** bpw & **8** bpw head](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/5.8bpw_h8)
+should use **21.85 GB VRAM** with 4 bit cache or **23.69 GB** with 16 bit cache
+- [**6.5** bpw & **8** bpw head](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/6.5bpw_h8)
+should use **23.81 GB VRAM** with 4 bit cache
+The difference between 6 bit and 8 bit head is ~300 MB, it's not huge. It could be exchanged for about 0.1 bpw in the body, so 6.6bpw_h6 should use similar VRAM to 6.5bpw_h8.
+---
+Check out turboderp's quants & `measurement.json`:
+[3.00 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/3.0bpw)
+[3.50 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/3.5bpw)
+[4.00 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/4.0bpw)
+[4.50 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/4.5bpw)
+[5.00 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/5.0bpw)
+[6.00 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/6.0bpw)
+[8.00 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/8.0bpw)
+[measurement.json](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/blob/main/measurement.json)