maddes8cht
commited on
Commit
•
eceaa8d
1
Parent(s):
6efe805
"Update README.md"
Browse files
README.md
CHANGED
@@ -50,19 +50,21 @@ The core project making use of the ggml library is the [llama.cpp](https://githu
|
|
50 |
|
51 |
# Quantization variants
|
52 |
|
53 |
-
There is a bunch of quantized files available.
|
54 |
|
55 |
# Legacy quants
|
56 |
|
57 |
Q4_0, Q4_1, Q5_0, Q5_1 and Q8 are `legacy` quantization types.
|
58 |
Nevertheless, they are fully supported, as there are several circumstances that cause certain model not to be compatible with the modern K-quants.
|
59 |
-
|
|
|
|
|
60 |
|
61 |
# K-quants
|
62 |
|
63 |
-
K-quants are
|
64 |
So, if possible, use K-quants.
|
65 |
-
With a Q6_K you
|
66 |
|
67 |
|
68 |
|
|
|
50 |
|
51 |
# Quantization variants
|
52 |
|
53 |
+
There is a bunch of quantized files available to cater to your specific needs. Here's how to choose the best option for you:
|
54 |
|
55 |
# Legacy quants
|
56 |
|
57 |
Q4_0, Q4_1, Q5_0, Q5_1 and Q8 are `legacy` quantization types.
|
58 |
Nevertheless, they are fully supported, as there are several circumstances that cause certain model not to be compatible with the modern K-quants.
|
59 |
+
## Note:
|
60 |
+
Now there's a new option to use K-quants even for previously 'incompatible' models, although this involves some fallback solution that makes them not *real* K-quants. More details can be found in affected model descriptions.
|
61 |
+
(This mainly refers to Falcon 7b and Starcoder models)
|
62 |
|
63 |
# K-quants
|
64 |
|
65 |
+
K-quants are designed with the idea that different levels of quantization in specific parts of the model can optimize performance, file size, and memory load.
|
66 |
So, if possible, use K-quants.
|
67 |
+
With a Q6_K, you'll likely find it challenging to discern a quality difference from the original model - ask your model two times the same question and you may encounter bigger quality differences.
|
68 |
|
69 |
|
70 |
|