readme: edit formatting & add banner
Browse files
README.md
CHANGED
@@ -16,13 +16,15 @@ language:
|
|
16 |
|
17 |
# DeepSeek-V2-Chat-GGUF
|
18 |
|
|
|
|
|
19 |
Quantizised from [https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat](https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat)
|
20 |
|
21 |
Using llama.cpp [b3026](https://github.com/ggerganov/llama.cpp/releases/tag/b3026) for quantizisation. Given the rapid release of llama.cpp builds, this will likely change over time.
|
22 |
|
23 |
-
|
24 |
|
25 |
-
#
|
26 |
|
27 |
**Downloading the bf16:**
|
28 |
|
@@ -75,33 +77,35 @@ quantize \
|
|
75 |
(--imatrix [file])
|
76 |
```
|
77 |
|
78 |
-
|
79 |
-
```
|
80 |
-
- bf16 [size: 439gb]
|
81 |
-
- q8_0 (uploading) [size: 233.27gb]
|
82 |
-
- q4_k_m [size: 132gb]
|
83 |
-
- q2_k [size: 80gb]
|
84 |
-
- iq2_xxs [size: 61.5gb]
|
85 |
-
- iq3_xs [size: 89.6gb]
|
86 |
-
- iq1_m (uploading) [size: 27.3gb]
|
87 |
-
- q3_k_m (uploading) [size: 92.6gb]
|
88 |
-
```
|
89 |
-
|
90 |
-
Note: Use iMatrix quants only if you can fully offload to GPU, otherwise speed will be affected a lot.
|
91 |
|
92 |
-
#
|
93 |
-
```
|
94 |
-
- q5_k_m
|
95 |
-
- q5_k_s
|
96 |
-
- q6_k
|
97 |
-
- iq4_xs
|
98 |
-
- iq2_xs
|
99 |
-
- iq2_s
|
100 |
-
- iq2_m
|
101 |
-
- iq1_s (note: for fun only, this quant is likely useless)
|
102 |
-
```
|
103 |
|
104 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
105 |
```
|
106 |
deepseek2.attention.q_lora_rank=int:1536
|
107 |
deepseek2.attention.kv_lora_rank=int:512
|
@@ -112,7 +116,7 @@ deepseek2.leading_dense_block_count=int:1
|
|
112 |
deepseek2.rope.scaling.yarn_log_multiplier=float:0.0707
|
113 |
```
|
114 |
|
115 |
-
The Q8_0 quant contains these parameters, along with future ones, so as long as you're running a supported build of llama.cpp no `--override-kv` parameters are required.
|
116 |
|
117 |
A precompiled AVX2 version is avaliable at `llama.cpp-039896407afd40e54321d47c5063c46a52da3e01.zip` in the root of this repo.
|
118 |
|
@@ -121,13 +125,13 @@ A precompiled AVX2 version is avaliable at `llama.cpp-039896407afd40e54321d47c50
|
|
121 |
- MIT license for any repo code
|
122 |
|
123 |
# Performance:
|
124 |
-
|
125 |
|
126 |
# iMatrix:
|
127 |
-
Find imatrix.dat in the root of this repo, made with a Q2_K quant (see here for info: [https://github.com/ggerganov/llama.cpp/issues/5153#issuecomment-1913185693](https://github.com/ggerganov/llama.cpp/issues/5153#issuecomment-1913185693))
|
128 |
|
129 |
-
Using groups_merged.txt
|
130 |
|
131 |
# Censorship:
|
132 |
|
133 |
-
This model is
|
|
|
16 |
|
17 |
# DeepSeek-V2-Chat-GGUF
|
18 |
|
19 |
+
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6604e5b21eb292d6df393365/j_LWkNdegeMjQXuAOFZ1N.jpeg)
|
20 |
+
|
21 |
Quantizised from [https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat](https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat)
|
22 |
|
23 |
Using llama.cpp [b3026](https://github.com/ggerganov/llama.cpp/releases/tag/b3026) for quantizisation. Given the rapid release of llama.cpp builds, this will likely change over time.
|
24 |
|
25 |
+
**If you are using an older quant, please set the metadata KV overrides below.**
|
26 |
|
27 |
+
# Usage:
|
28 |
|
29 |
**Downloading the bf16:**
|
30 |
|
|
|
77 |
(--imatrix [file])
|
78 |
```
|
79 |
|
80 |
+
Note: Use iMatrix quants only if you can fully offload to GPU, otherwise speed will be affected negatively.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
81 |
|
82 |
+
# Quants:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
83 |
|
84 |
+
| Quant | Status | Size | Description | KV Metadata | Weighted | Notes |
|
85 |
+
|----------|-------------|-----------|--------------------------------------------|-------------|----------|-------|
|
86 |
+
| BF16 | Available | 439 GB | Lossless :) | Old | No | Q8_0 is sufficient for most cases |
|
87 |
+
| Q8_0 | Uploading | 233.27 GB | High quality *recommended* | Updated | Yes | |
|
88 |
+
| Q4_K_M | Available | 132 GB | Medium quality *recommended* | Old | No | |
|
89 |
+
| Q3_K_M | Uploading | 92.6 GB | Medium-low quality | Updated | Yes | |
|
90 |
+
| IQ3_XS | Available | 89.6 GB | Better than Q3_K_M | Old | Yes | |
|
91 |
+
| Q2_K | Available | 80.0 GB | Low quality **not recommended** | Old | No | |
|
92 |
+
| IQ2_XXS | Available | 61.5 GB | Lower quality **not recommended** | Old | Yes | |
|
93 |
+
| IQ1_M | Uploading | 27.3 GB | Extremely low quality **not recommended** | Old | Yes | Testing purposes; use IQ2 at least |
|
94 |
+
|
95 |
+
|
96 |
+
# Planned Quants (weighted/iMatrix):
|
97 |
+
|
98 |
+
| Planned Quant | Notes |
|
99 |
+
|-------------------|---------|
|
100 |
+
| Q5_K_M | |
|
101 |
+
| Q5_K_M | |
|
102 |
+
| Q6_K | |
|
103 |
+
| IQ4_XS | |
|
104 |
+
| IQ2_XS | |
|
105 |
+
| IQ2_S | |
|
106 |
+
| IQ2_M | |
|
107 |
+
|
108 |
+
Metadata KV overrides (pass them using `--override-kv`, can be specified multiple times):
|
109 |
```
|
110 |
deepseek2.attention.q_lora_rank=int:1536
|
111 |
deepseek2.attention.kv_lora_rank=int:512
|
|
|
116 |
deepseek2.rope.scaling.yarn_log_multiplier=float:0.0707
|
117 |
```
|
118 |
|
119 |
+
The `Q8_0` quant contains these parameters, along with future ones, so as long as you're running a supported build of llama.cpp no `--override-kv` parameters are required.
|
120 |
|
121 |
A precompiled AVX2 version is avaliable at `llama.cpp-039896407afd40e54321d47c5063c46a52da3e01.zip` in the root of this repo.
|
122 |
|
|
|
125 |
- MIT license for any repo code
|
126 |
|
127 |
# Performance:
|
128 |
+
*~1.5t/s* with Ryzen 3 3700x (96gb 3200mhz) `[Q2_K]`
|
129 |
|
130 |
# iMatrix:
|
131 |
+
Find `imatrix.dat` in the root of this repo, made with a `Q2_K` quant (see here for info: [https://github.com/ggerganov/llama.cpp/issues/5153#issuecomment-1913185693](https://github.com/ggerganov/llama.cpp/issues/5153#issuecomment-1913185693))
|
132 |
|
133 |
+
Using `groups_merged.txt`, find it here: [https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)
|
134 |
|
135 |
# Censorship:
|
136 |
|
137 |
+
This model is a bit censored, finetuning on toxic DPO might help.
|