CISCai
/

Qwen2.5-Coder-0.5B-Instruct-SOTA-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

CISCai commited on 4 days ago

Commit

808ff46

•

1 Parent(s): 57c5b22

Upload README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -100,7 +100,7 @@ Generated importance matrix file: [Qwen2.5-Coder-0.5B-Instruct.imatrix.dat](http
 Make sure you are using `llama.cpp` from commit [0becb22](https://github.com/ggerganov/llama.cpp/commit/0becb22ac05b6542bd9d5f2235691aa1d3d4d307) or later.
 ```shell
-./llama-cli -ngl 25 -m Qwen2.5-Coder-0.5B-Instruct.IQ4_XS.gguf --color -c 32768 --temp 0.7 --top-p 0.8 --top-k 20 --repeat-penalty 1.05 -p "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>\n{prompt}<|im_end|>\n<|im_start|>assistant\n"
 ```
 Change `-ngl 25` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.

 Make sure you are using `llama.cpp` from commit [0becb22](https://github.com/ggerganov/llama.cpp/commit/0becb22ac05b6542bd9d5f2235691aa1d3d4d307) or later.
 ```shell
+./llama-cli -ngl 25 -m Qwen2.5-Coder-0.5B-Instruct.IQ4_NL.gguf --color -c 32768 --temp 0.7 --top-p 0.8 --top-k 20 --repeat-penalty 1.05 -p "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>\n{prompt}<|im_end|>\n<|im_start|>assistant\n"
 ```
 Change `-ngl 25` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.