aikitoria
/

Goliath-longLORA-120b-rope8-32k-exl2

Text Generation

Model card Files Files and versions Community

aikitoria commited on Jan 27

Commit

d14ea4b

•

1 Parent(s): bb199f4

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -12,6 +12,7 @@ I did not create that model, only discovered it and wanted to try it for myself,
 [2.65bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/2.65bpw) using default dataset
 [3bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/3bpw) using default dataset
 [4.35bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/4.35bpw) using default dataset
 [4.35bpw-rpcal](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/4.35bpw-rpcal) using PIPPA dataset
@@ -31,6 +32,7 @@ context 32k, cache 16: 78.7GiB (fits in A100 80GB)
 # Super epic scientific test results
 - The 2.65bpw version suffered greatly, it's not completely broken, but it's no good either.
 - The 3bpw version hasn't suffered as much, it's much more usable than the 2.65bpw one.
 - The 4.35bpw version is a bit worse than normal 4k goliath but better than goliath with rope scale applied for 8k+ context.
 - The version using the PIPPA dataset produces worse results than the one using the default dataset on any context length.

 [2.65bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/2.65bpw) using default dataset
 [3bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/3bpw) using default dataset
+[4bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/4bpw) using default dataset
 [4.35bpw](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/4.35bpw) using default dataset
 [4.35bpw-rpcal](https://huggingface.co/aikitoria/Goliath-longLORA-120b-rope8-32k-exl2/tree/4.35bpw-rpcal) using PIPPA dataset
 # Super epic scientific test results
 - The 2.65bpw version suffered greatly, it's not completely broken, but it's no good either.
 - The 3bpw version hasn't suffered as much, it's much more usable than the 2.65bpw one.
+- The 4bpw version can be used with CFG since that requires more memory for the context.
 - The 4.35bpw version is a bit worse than normal 4k goliath but better than goliath with rope scale applied for 8k+ context.
 - The version using the PIPPA dataset produces worse results than the one using the default dataset on any context length.