File size: 2,261 Bytes
598f9fe
 
eeefc9b
598f9fe
d460e06
0cf53a1
e5abca4
0cf53a1
5f98e1c
d460e06
846f7ee
c46b431
d460e06
479de1e
d14ea4b
d460e06
 
 
 
 
 
 
 
479de1e
5ddb140
 
d460e06
 
 
 
 
 
479de1e
bb199f4
d14ea4b
479de1e
cd3b412
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
---
license: llama2
pipeline_tag: text-generation
---

These are exl2 quants of [Goliath-longLORA-120b-rope8-32k-fp16](https://huggingface.co/grimulkan/Goliath-longLORA-120b-rope8-32k-fp16) which combines goliath with 32k context.  

I did not create that model, only discovered it and wanted to try it for myself, so I made smaller quants.  

# Available versions
[main](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/main) has measurements for default dataset and the one for [goliath-120b-exl2-rpcal](https://huggingface.co/Panchovix/goliath-120b-exl2-rpcal)  

[2.65bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/2.65bpw) using default dataset  
[3bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/3bpw) using default dataset  
[4bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/4bpw) using default dataset  
[4.35bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/4.35bpw) using default dataset  
[4.35bpw-rpcal](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/4.35bpw-rpcal) using PIPPA dataset


# Memory usage tests
### 2.65bpw
context 16k, cache 16: 46.9GiB (fits in 2x 3090)  
context 32k, cache 8: 47GiB (fits in 2x 3090)  
### 3bpw
context 8k, cache 16: 47.4GiB (fits in 2x 3090)  
context 16k, cache 8: 47.4GiB (fits in 2x 3090)  
### 4.35bpw
context 16k, cache 16: 70.1GiB (fits in 3x 3090)  
context 32k, cache 8: 70.3GiB (fits in 3x 3090)  
context 32k, cache 16: 78.7GiB (fits in A100 80GB)  

# Super epic scientific test results
- The 2.65bpw version suffered greatly, it's not completely broken, but it's no good either.  
- The 3bpw version hasn't suffered as much, it's much more usable than the 2.65bpw one.  
- The 4bpw version can be used with CFG since that requires more memory for the context.
- The 4.35bpw version is a bit worse than normal 4k goliath but better than goliath with rope scale applied for 8k+ context.  
- The version using the PIPPA dataset produces worse results than the one using the default dataset on any context length.

My current strategy is to use the original goliath until its context is full and then switch over to this one.