nisten
/

deepseek-0628-gguf

Inference Endpoints

Model card Files Files and versions Community

nisten commited on Jul 19, 2024

Commit

50bf79b

·

verified ·

1 Parent(s): b357d37

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -2,7 +2,9 @@
 base_model: [deepseek-ai/DeepSeek-V2-Chat-0628]
 ---
-#### 🚀 Custom quantizations of DeepSeek-V2-Chat-0628 supercharged for CPU inference! 🖥️
 ### 🧠 This IQ4XM version uses GGML TYPE IQ_4_XS 4bit in combination with q8_0 bit for blazing fast performance with minimal loss, leveraging int8 optimizations on most newer server CPUs.
 ### 🛠️ While it required some custom code wizardry, it's fully compatible with standard llama.cpp from GitHub or just search for nisten in lmstudio.

 base_model: [deepseek-ai/DeepSeek-V2-Chat-0628]
 ---
+#### 🚀 Custom quantizations of DeepSeek-V2-Chat-0628 supercharged for CPU inference of currently the #7 model globally on lmsys arena hard! 🖥️
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/rbdug3j6BaeTSmKLDIp39.png)
 ### 🧠 This IQ4XM version uses GGML TYPE IQ_4_XS 4bit in combination with q8_0 bit for blazing fast performance with minimal loss, leveraging int8 optimizations on most newer server CPUs.
 ### 🛠️ While it required some custom code wizardry, it's fully compatible with standard llama.cpp from GitHub or just search for nisten in lmstudio.