|
--- |
|
base_model: [deepseek-ai/DeepSeek-V2-Chat-0628] |
|
--- |
|
|
|
# 🚀 CPU optimized quantizations of [DeepSeek-V2-Chat-0628](https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat-0628) 🖥️ |
|
|
|
### Currently ranked #7 globally on LMSYS Arena Hard! 🏆 |
|
|
|
|
|
>### 🚄 Just download this IQ4XM 132Gb version, it's the one I use myself in prod: |
|
>🐧 On Linux `sudo apt install -y aria2` |
|
> |
|
>🍎 On Mac `brew install aria2` |
|
> |
|
>These links will download 9x faster, feel free to paste them all in or one at a time |
|
|
|
```verilog |
|
aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf |
|
|
|
aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00002-of-00004.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00002-of-00004.gguf |
|
|
|
aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00003-of-00004.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00003-of-00004.gguf |
|
|
|
aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf |
|
``` |
|
>[!TIP] |
|
>//then to have a commandline conversation interface all you need is: |
|
```bash |
|
git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp && make -j |
|
./llama-cli -m ~/r/deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf -t 62 --temp 0.4 -co -cnv -i -c 3000 -p "Adopt the persona of a full-stack developer at NASA JPL." |
|
``` |
|
### 🧠 This IQ4XM version uses GGML TYPE IQ_4_XS 4bit in combination with q8_0 bit for blazing fast performance with minimal loss, leveraging int8 optimizations on most newer server CPUs. |
|
### 🛠️ While it required some custom code wizardry, it's fully compatible with standard llama.cpp from GitHub or just search for nisten in lmstudio. |
|
|
|
>[!TIP] |
|
> |
|
>📁 No need for file concatenation - just point llama-cli at the first file and watch the magic happen! |
|
> |
|
>💻 Ready to delve in baby? Here's your command-line spell for interactive mode (prompt.txt is optional, but recommended for maximum sorcery): |
|
>```bash |
|
>./llama-cli --temp 0.4 -m deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf -c 32000 -co -cnv -i -f prompt.txt |
|
>``` |
|
|
|
### Perplexity benchmarks |
|
|
|
```verilog |
|
./llama-perplexity -m ~/r/deepseek_0628_cpu-iq4xm-00001-of-00002.gguf --chunks 12 -f ~/wiki.test.raw |
|
``` |
|
```verilog |
|
//the 4bit iq4xm gets better perplexity than bf16 lol but it's likely just a rounding error |
|
|
|
deepseek-0628-bf16-00001-of-00011.gguf //16B |
|
Model size: 440 Gib |
|
perplexity: 735.50 seconds per pass - ETA 36.77 minutes |
|
[1]2.4827,[2]3.3887,[3]2.9470,[4]3.4768,[5]3.9012,[6]4.5128,[7]4.7533,[8]4.9550,[9]5.2863,[10]5.6824,[11]5.7541,[12]5.8734, |
|
Final estimate: PPL = 5.8734 +/- 0.26967 |
|
|
|
deepseek-0628-q8_0-00001-of-00006.gguf //8Bit |
|
model size = 233.41 GiB (8.50 BPW) |
|
perplexity: 49.96 seconds per pass - ETA 2.48 minutes |
|
[1]2.5022,[2]3.3930,[3]2.9422,[4]3.4757,[5]3.8977,[6]4.5114,[7]4.7577,[8]4.9631,[9]5.2926,[10]5.6878,[11]5.7580,[12]5.8782, |
|
Final estimate: PPL = 5.8782 +/- 0.27021 |
|
|
|
deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf //4Bit |
|
model size: 132.1 GiB |
|
perplexity: 59.49 seconds per pass - ETA 2.97 minutes |
|
[1]2.4954,[2]3.3941,[3]2.9607,[4]3.4755,[5]3.8889,[6]4.5036,[7]4.7364,[8]4.9401,[9]5.2737,[10]5.6651,[11]5.7354,[12]5.8620, |
|
Final estimate: PPL = 5.8620 +/- 0.26853 |
|
|
|
|
|
deepseek_0628_cpu-iq1m-00001-of-00002.gguf |
|
model size = 73.27 GiB (2.67 BPW) |
|
perplexity: 96.54 seconds per pass - ETA 4.82 minutes |
|
[1]3.4340,[2]4.5503,[3]4.0118,[4]4.5807,[5]4.9540,[6]5.7353,[7]5.9430,[8]6.1320,[9]6.5690,[10]6.9401,[11]7.0363,[12]7.1857, |
|
Final estimate: PPL = 7.1857 +/- 0.33585 |
|
|
|
deepseek_0628_cpu_iq1_s-00001-of-00002.gguf |
|
model size = 58.42 GiB (2.13 BPW) |
|
perplexity: 94.39 seconds per pass - ETA 4.72 minutes |
|
[1]3.3257,[2]4.7059,[3]4.3868,[4]4.8870,[5]5.3162,[6]6.0753,[7]6.2931,[8]6.5085,[9]6.8913,[10]7.3148,[11]7.4235,[12]7.6295, |
|
Final estimate: PPL = 7.6295 +/- 0.36143 |
|
|
|
deepseek-0628-cpu-2bit-IQ2_XXS-00001-of-00002.gguf |
|
model size = 80.58 GiB (2.94 BPW) |
|
[1]2.7202,[2]3.9132,[3]3.5575,[4]4.0150,[5]4.4171,[6]5.0741,[7]5.2683,[8]5.4653,[9]5.8189,[10]6.2432,[11]6.3324,[12]6.4842, |
|
Final estimate: PPL = 6.4842 +/- 0.29700 |
|
``` |
|
|
|
>[!TIP] |
|
>### 🚄 More scripts for accelerated downloads:: |
|
> |
|
|
|
```bash |
|
# 🏋️ For the nearly lossless Q8_0 version |
|
aria2c -x 8 -o deepseek-0628-q8_0-00001-of-00006.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00001-of-00006.gguf |
|
|
|
aria2c -x 8 -o deepseek-0628-q8_0-00002-of-00006.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00002-of-00006.gguf |
|
|
|
aria2c -x 8 -o deepseek-0628-q8_0-00003-of-00006.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00003-of-00006.gguf |
|
|
|
aria2c -x 8 -o deepseek-0628-q8_0-00004-of-00006.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00004-of-00006.gguf |
|
|
|
aria2c -x 8 -o deepseek-0628-q8_0-00005-of-00006.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00005-of-00006.gguf |
|
|
|
aria2c -x 8 -o deepseek-0628-q8_0-00006-of-00006.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00006-of-00006.gguf |
|
``` |
|
|
|
```bash |
|
# 🧠 For the full-brain BF16 version |
|
aria2c -x 8 -o deepseek-0628-bf16-00001-of-00011.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00001-of-00011.gguf |
|
|
|
aria2c -x 8 -o deepseek-0628-bf16-00002-of-00011.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00002-of-00011.gguf |
|
|
|
aria2c -x 8 -o deepseek-0628-bf16-00003-of-00011.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00003-of-00011.gguf |
|
|
|
aria2c -x 8 -o deepseek-0628-bf16-00004-of-00011.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00004-of-00011.gguf |
|
|
|
aria2c -x 8 -o deepseek-0628-bf16-00005-of-00011.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00005-of-00011.gguf |
|
|
|
aria2c -x 8 -o deepseek-0628-bf16-00006-of-00011.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00006-of-00011.gguf |
|
|
|
aria2c -x 8 -o deepseek-0628-bf16-00007-of-00011.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00007-of-00011.gguf |
|
|
|
aria2c -x 8 -o deepseek-0628-bf16-00008-of-00011.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00008-of-00011.gguf |
|
|
|
aria2c -x 8 -o deepseek-0628-bf16-00009-of-00011.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00009-of-00011.gguf |
|
|
|
aria2c -x 8 -o deepseek-0628-bf16-00010-of-00011.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00010-of-00011.gguf |
|
|
|
aria2c -x 8 -o deepseek-0628-bf16-00011-of-00011.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00011-of-00011.gguf |
|
``` |
|
|
|
<figure> |
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/e4Bt3dpKKt0CPGxjflSdb.png" alt="deepseek-0628-bf16 example response"> |
|
<figcaption><strong>deepseek-0628-bf16 (440GB):</strong> Example response from full bf16 model</figcaption> |
|
</figure> |
|
|
|
>[!TIP] |
|
>### 🚄 Even more accelerated download links for other quantizations: |
|
> |
|
>🧪 Experimental versions - the q1s and q1m 1 bit ( avg 2.1 bpw and 2.6bpw are suprisingly coherent!) |
|
> |
|
|
|
```bash |
|
# 2-bit IQ2_XXS version (80.6 GiB total) |
|
aria2c -x 8 -o deepseek-0628-cpu-2bit-IQ2_XXS-00001-of-00002.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-2bit-IQ2_XXS-00001-of-00002.gguf |
|
|
|
aria2c -x 8 -o deepseek-0628-cpu-2bit-IQ2_XXS-00002-of-00002.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-2bit-IQ2_XXS-00002-of-00002.gguf |
|
|
|
# Q6K version (187.1 GiB total) |
|
aria2c -x 8 -o deepseek-0628-cpu-q6k-00001-of-00005.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00001-of-00005.gguf |
|
|
|
aria2c -x 8 -o deepseek-0628-cpu-q6k-00002-of-00005.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00002-of-00005.gguf |
|
|
|
aria2c -x 8 -o deepseek-0628-cpu-q6k-00003-of-00005.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00003-of-00005.gguf |
|
|
|
aria2c -x 8 -o deepseek-0628-cpu-q6k-00004-of-00005.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00004-of-00005.gguf |
|
|
|
aria2c -x 8 -o deepseek-0628-cpu-q6k-00005-of-00005.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00005-of-00005.gguf |
|
|
|
# Q4_0_8_8 faster but dumber version (~169.3GB total) |
|
aria2c -x 8 -o deepseek-0628-q4_0_8_8_faster_dumber-00001-of-00004.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q4_0_8_8_faster_dumber-00001-of-00004.gguf |
|
|
|
aria2c -x 8 -o deepseek-0628-q4_0_8_8_faster_dumber-00002-of-00004.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q4_0_8_8_faster_dumber-00002-of-00004.gguf |
|
|
|
aria2c -x 8 -o deepseek-0628-q4_0_8_8_faster_dumber-00003-of-00004.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q4_0_8_8_faster_dumber-00003-of-00004.gguf |
|
|
|
aria2c -x 8 -o deepseek-0628-q4_0_8_8_faster_dumber-00004-of-00004.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q4_0_8_8_faster_dumber-00004-of-00004.gguf |
|
``` |
|
|
|
The following 1 bit mixed quant versions are strangely good: |
|
|
|
<figure> |
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/Qxx4p2l0prHiScCdL68XK.png" alt="deepseek_0628_cpu-iq1m example response"> |
|
<figcaption><strong>deepseek_0628_cpu-iq1m (73.27 GB):</strong> Mixed 1bit response response is strangely good</figcaption> |
|
</figure> |
|
|
|
|
|
```bash |
|
# IQ1M version (73.27 GB) |
|
aria2c -x 8 -o deepseek_0628_cpu-iq1m-00001-of-00002.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu-iq1m-00001-of-00002.gguf |
|
|
|
aria2c -x 8 -o deepseek_0628_cpu-iq1m-00002-of-00002.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu-iq1m-00002-of-00002.gguf |
|
``` |
|
<figure> |
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/N0lQ5HAJbNbXIG1MbtB4x.png" alt="deepseek_0628_cpu-iq1s example response"> |
|
<figcaption><strong>deepseek_0628_cpu-iq1s (58.42 GB):</strong> Even the smallest IQ1_S version (52.7GB total) is coherent with these custom quants</figcaption> |
|
</figure> |
|
|
|
|
|
```bash |
|
# IQ1_S version (58.42 GB) |
|
aria2c -x 8 -o deepseek_0628_cpu_iq1_s-00001-of-00002.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_iq1_s-00001-of-00002.gguf |
|
|
|
aria2c -x 8 -o deepseek_0628_cpu_iq1_s-00002-of-00002.gguf \ |
|
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_iq1_s-00002-of-00002.gguf |
|
``` |
|
📜 The use of DeepSeek-V2-Chat-0628 model is subject to the [DeepSeek Model License](https://github.com/deepseek-ai/DeepSeek-V2/blob/main/LICENSE-MODEL). DeepSeek-V2 series supports commercial use. It's a permissive license that only restricts use for military purposes, harming minors, or patent trolling. |
|
|
|
### 🌟 Model Information |
|
|
|
DeepSeek-V2-Chat-0628 is the latest and greatest in the DeepSeek family. This AI powerhouse has climbed the LMSYS Chatbot Arena Leaderboard faster than a rocket on steroids: |
|
|
|
- 🏆 Overall Arena Ranking: #11 global |
|
- 💻 Coding Arena Ranking: #3, global |
|
- 🧠 Hard Prompts Arena Ranking: #7 global, better than claude opus even in english only hard-prompts |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/rbdug3j6BaeTSmKLDIp39.png) |
|
Want to seek deeper into this model's ocean of weights? Swim over to the [OG model page](https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat-0628) |
|
|
|
Now go forth and accelerate 🚀💡 |
|
|
|
-Nisten |
|
|