aikitoria commited on
Commit
d14ea4b
1 Parent(s): bb199f4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -12,6 +12,7 @@ I did not create that model, only discovered it and wanted to try it for myself,
12
 
13
  [2.65bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/2.65bpw) using default dataset
14
  [3bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/3bpw) using default dataset
 
15
  [4.35bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/4.35bpw) using default dataset
16
  [4.35bpw-rpcal](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/4.35bpw-rpcal) using PIPPA dataset
17
 
@@ -31,6 +32,7 @@ context 32k, cache 16: 78.7GiB (fits in A100 80GB)
31
  # Super epic scientific test results
32
  - The 2.65bpw version suffered greatly, it's not completely broken, but it's no good either.
33
  - The 3bpw version hasn't suffered as much, it's much more usable than the 2.65bpw one.
 
34
  - The 4.35bpw version is a bit worse than normal 4k goliath but better than goliath with rope scale applied for 8k+ context.
35
  - The version using the PIPPA dataset produces worse results than the one using the default dataset on any context length.
36
 
 
12
 
13
  [2.65bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/2.65bpw) using default dataset
14
  [3bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/3bpw) using default dataset
15
+ [4bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/4bpw) using default dataset
16
  [4.35bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/4.35bpw) using default dataset
17
  [4.35bpw-rpcal](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/4.35bpw-rpcal) using PIPPA dataset
18
 
 
32
  # Super epic scientific test results
33
  - The 2.65bpw version suffered greatly, it's not completely broken, but it's no good either.
34
  - The 3bpw version hasn't suffered as much, it's much more usable than the 2.65bpw one.
35
+ - The 4bpw version can be used with CFG since that requires more memory for the context.
36
  - The 4.35bpw version is a bit worse than normal 4k goliath but better than goliath with rope scale applied for 8k+ context.
37
  - The version using the PIPPA dataset produces worse results than the one using the default dataset on any context length.
38