Upload README.md
Browse files
README.md
CHANGED
@@ -100,7 +100,7 @@ Generated importance matrix file: [Qwen2.5-Coder-0.5B-Instruct.imatrix.dat](http
|
|
100 |
Make sure you are using `llama.cpp` from commit [0becb22](https://github.com/ggerganov/llama.cpp/commit/0becb22ac05b6542bd9d5f2235691aa1d3d4d307) or later.
|
101 |
|
102 |
```shell
|
103 |
-
./llama-cli -ngl 25 -m Qwen2.5-Coder-0.5B-Instruct.
|
104 |
```
|
105 |
|
106 |
Change `-ngl 25` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
|
|
|
100 |
Make sure you are using `llama.cpp` from commit [0becb22](https://github.com/ggerganov/llama.cpp/commit/0becb22ac05b6542bd9d5f2235691aa1d3d4d307) or later.
|
101 |
|
102 |
```shell
|
103 |
+
./llama-cli -ngl 25 -m Qwen2.5-Coder-0.5B-Instruct.IQ4_NL.gguf --color -c 32768 --temp 0.7 --top-p 0.8 --top-k 20 --repeat-penalty 1.05 -p "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>\n{prompt}<|im_end|>\n<|im_start|>assistant\n"
|
104 |
```
|
105 |
|
106 |
Change `-ngl 25` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
|