Update README.md
Browse files
README.md
CHANGED
@@ -35,7 +35,10 @@ Below are the weights/file names in this repo:
|
|
35 |
| flan-ul2-q4k.gguf | q4k | 10.9 |
|
36 |
| flan-ul2-q6k.gguf | q6k | 16 |
|
37 |
|
38 |
-
From initial testing
|
|
|
|
|
|
|
39 |
|
40 |
## setup
|
41 |
|
|
|
35 |
| flan-ul2-q4k.gguf | q4k | 10.9 |
|
36 |
| flan-ul2-q6k.gguf | q6k | 16 |
|
37 |
|
38 |
+
From initial testing:
|
39 |
+
|
40 |
+
- it appears that q2k is too low precision and produces poor/incoherent output. The `q3k` and higher are coherent.
|
41 |
+
- Interestingly, there is no noticeable increase in computation time (_again, on CPU_) when using higher precision quants. I get the same tok/sec for q3k and q6k +/- 0.02
|
42 |
|
43 |
## setup
|
44 |
|