Update README.md
Browse files
README.md
CHANGED
@@ -2,4 +2,21 @@
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
### GGML转换脚本详见:
|
5 |
-
<https://github.com/LinkSoul-AI/Chinese-Llama-2-7b/tree/main/ggml>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
### GGML转换脚本详见:
|
5 |
+
<https://github.com/LinkSoul-AI/Chinese-Llama-2-7b/tree/main/ggml>
|
6 |
+
|
7 |
+
## 量化配置的定义:
|
8 |
+
转自: <https://www.reddit.com/r/LocalLLaMA/comments/139yt87/notable_differences_between_q4_2_and_q5_1/>
|
9 |
+
|
10 |
+
* q4_0 = 32 numbers in chunk, 4 bits per weight, 1 scale value at 32-bit float (5 bits per value in average), each weight is given by the common scale * quantized value.
|
11 |
+
|
12 |
+
* q4_1 = 32 numbers in chunk, 4 bits per weight, 1 scale value and 1 bias value at 32-bit float (6 bits per value in average), each weight is given by the common scale * quantized value + common bias.
|
13 |
+
|
14 |
+
* q4_2 = same as q4_0, but 16 numbers in chunk, 4 bits per weight, 1 scale value that is 16-bit float, same size as q4_0 but better because chunks are smaller.
|
15 |
+
|
16 |
+
* q4_3 = already dead, but analogous: q4_1 but 16 numbers in chunk, 4 bits per weight, scale value that is 16 bit and bias also 16 bits, same size as q4_1 but better because chunks are smaller.
|
17 |
+
|
18 |
+
* q5_0 = 32 numbers in chunk, 5 bits per weight, 1 scale value at 16-bit float, size is 5.5 bits per weight
|
19 |
+
|
20 |
+
* q5_1 = 32 numbers in a chunk, 5 bits per weight, 1 scale value at 16 bit float and 1 bias value at 16 bit, size is 6 bits per weight.
|
21 |
+
|
22 |
+
* q8_0 = same as q4_0, except 8 bits per weight, 1 scale value at 32 bits, making total of 9 bits per weight.
|