Update README.md
Browse files
README.md
CHANGED
@@ -17,12 +17,18 @@ language:
|
|
17 |
- en
|
18 |
pipeline_tag: text-generation
|
19 |
---
|
|
|
|
|
|
|
|
|
20 |
|
|
|
21 |
- ⚠️ A low temperature must be used to ensure it won't fail at reasoning. we use 0.3 - 0.8!
|
|
|
22 |
- this is out flagship model, with top-tier reasoning, rivaling gemini-flash-exp-2.0-thinking and o1 mini. results are overall similar to both of them, we are not comparing to qwq as it has much longer results which waste tokens.
|
23 |
|
24 |
|
25 |
-
the model uses this prompt: (modified phi-4 prompt)
|
26 |
```
|
27 |
{{ if .System }}<|system|>
|
28 |
{{ .System }}<|im_end|>
|
|
|
17 |
- en
|
18 |
pipeline_tag: text-generation
|
19 |
---
|
20 |
+
🧀 Which quant is right for you? (all tested!)
|
21 |
+
- ***Q3:*** This quant should be used on most high end modern devices like rtx 3080, Responses are very high quality, but its slightly slower than Q4. (Runs at ~1 tokens per second or less on a Samsung z fold 5 smartphone.)
|
22 |
+
- ***Q4:*** This quant should be used on high-end modern devices like rtx 3080's or any GPU,TPU,CPU that is powerful enough and has at minimum 15gb of available memory, (On servers and high-end computers we personally use it.) reccomened.
|
23 |
+
- ***Q8:*** This quant should be used on very high-end modern devices which can handle it's power, it is very powerful but q4 is more well rounded, not reccomened.
|
24 |
|
25 |
+
# Information
|
26 |
- ⚠️ A low temperature must be used to ensure it won't fail at reasoning. we use 0.3 - 0.8!
|
27 |
+
- ⚠️ Due to the current prompt format, it may sometimes put <|FinalAnswer|> at the end, you can ignore this or modify the prompt format.
|
28 |
- this is out flagship model, with top-tier reasoning, rivaling gemini-flash-exp-2.0-thinking and o1 mini. results are overall similar to both of them, we are not comparing to qwq as it has much longer results which waste tokens.
|
29 |
|
30 |
|
31 |
+
the model uses this prompt format: (modified phi-4 prompt)
|
32 |
```
|
33 |
{{ if .System }}<|system|>
|
34 |
{{ .System }}<|im_end|>
|