mo137 commited on
Commit
0519e0e
·
verified ·
1 Parent(s): ffc118a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -3,14 +3,14 @@ license: gemma
3
  ---
4
  EXL2 quants of [gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it)
5
 
6
- My quants are meant to be a tight fit in 24 GB VRAM.
7
 
8
- - [**5.8** bpw & **8** bpw head](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/5.8bpw_h8)
9
- should use **21.85 GB VRAM** with 4 bit cache or **23.69 GB** with 16 bit cache
10
- - [**6.5** bpw & **8** bpw head](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/6.5bpw_h8)
11
- should use **23.81 GB VRAM** with 4 bit cache
12
 
13
- The difference between 6 bit and 8 bit head is ~300 MB, it's not huge. It could be exchanged for about 0.1 bpw in the body, so 6.6bpw_h6 should use similar VRAM to 6.5bpw_h8.
14
 
15
  ---
16
  Check out turboderp's quants & `measurement.json`:
 
3
  ---
4
  EXL2 quants of [gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it)
5
 
6
+ My quants are meant to be a tight fit in 24 GB VRAM. Following VRAM usage numbers assume **8k context**.
7
 
8
+ bpw|head|4 bit cache|16 bit cache
9
+ --:|--:|--:|--:
10
+ 👉 [**5.8**](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/5.8bpw_h8)|8 bit|21.85 GB|23.69 GB
11
+ 👉 [**6.5**](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/6.5bpw_h8)|8 bit|23.81 GB|25.65 GB
12
 
13
+ For this model the difference between 6 bit and 8 bit head is ~300 MB, it's not huge. It could be exchanged for about 0.1 bpw in the body.
14
 
15
  ---
16
  Check out turboderp's quants & `measurement.json`: