pszemraj
/

candle-flanUL2-quantized

Text2Text Generation

Model card Files Files and versions Community

pszemraj commited on Aug 27, 2024

Commit

ce70db7

·

verified ·

1 Parent(s): b5b4be0

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -35,7 +35,10 @@ Below are the weights/file names in this repo:
 | flan-ul2-q4k.gguf       | q4k          | 10.9      |
 | flan-ul2-q6k.gguf       | q6k          | 16        |
-From initial testing, it appears that q2k is too low precision and produces poor/incoherent output. The `q3k` and higher are coherent.
 ## setup

 | flan-ul2-q4k.gguf       | q4k          | 10.9      |
 | flan-ul2-q6k.gguf       | q6k          | 16        |
+From initial testing:
+- it appears that q2k is too low precision and produces poor/incoherent output. The `q3k` and higher are coherent.
+- Interestingly, there is no noticeable increase in computation time (_again, on CPU_) when using higher precision quants. I get the same tok/sec for q3k and q6k +/- 0.02
 ## setup