nisten
/

orpo-zephyr-8x22-EdgeQuants-gguf

Inference Endpoints

Model card Files Files and versions Community

nisten commited on Apr 12, 2024

Commit

6875282

·

verified ·

1 Parent(s): 491f30f

Update README.md

Files changed (1) hide show

README.md +14 -2

README.md CHANGED Viewed

@@ -2,7 +2,17 @@
 base_model: HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1
 license: apache-2.0
 ---
-## orpo4ns.gguf is good to go, 2bit also done but not recommended, other quants STILL UPLOADING.
 # Importance-Matrix quantizations of HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1
@@ -52,4 +62,6 @@ Final estimate: PPL = 4.0884 +/- 0.1499
 orpo3ns.gguf 58536MB
 [1]2.8042,[2]3.3418,[3]3.9400,[4]3.5859,[5]3.2042,[6]3.0524,[7]2.9738,[8]2.9695,[9]3.0232,[10]3.0099,[11]3.0510,[12]3.0589,
 Final estimate: PPL = 3.0589 +/- 0.09882
-```

 base_model: HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1
 license: apache-2.0
 ---
+## Alex, download the 3bit orpo3ns.gguf.part0 & orpo3ns.gguf.part1 files then:
+```
+cd ~/Downloads
+cat orpo3ns.gguf.part* > orpo3ns.gguf && rm -rf orpo3ns.gguf.part*
+```
+For lmStudio you need to copy the full orpo3ns.gguf file to your ~/.cache/lm-studio/models/YourNAME/
+## orpo4ns.gguf is good to go, 2bit also done but not recommended.
 # Importance-Matrix quantizations of HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1
 orpo3ns.gguf 58536MB
 [1]2.8042,[2]3.3418,[3]3.9400,[4]3.5859,[5]3.2042,[6]3.0524,[7]2.9738,[8]2.9695,[9]3.0232,[10]3.0099,[11]3.0510,[12]3.0589,
 Final estimate: PPL = 3.0589 +/- 0.09882
+```
+# The 3bit version is surprisingly usable even though only 58GB