huihui-ai
/

r1-1776-GGUF

Inference Endpoints

Model card Files Files and versions Community

huihui-ai commited on 5 days ago

Commit

f43ba7e

·

verified ·

1 Parent(s): ce66948

Update README.md

Files changed (1) hide show

README.md +5 -2

README.md CHANGED Viewed

@@ -36,7 +36,7 @@ The Model file is based on ggml-model-Q2_K.gguf.
 A single GPU with 24GB of memory can hold 4 layers of data, and num_gpu is set to 4, num_ctx is set to 2048.
-If there are 8 GPUs with 24GB of GPU memory each, num_gpu can be 32.
 The specific parameters can be changed according to your own tests.
@@ -84,9 +84,12 @@ ollama run huihui_ai/perplexity-ai-r1:671b-q2_K
 ```
 /set parameter num_ctx 2048
 ```
 ![image/png](png/02.png)
-The two parameters above should be set one at a time, and there's no need to send both to ollama at the same time.
 6. **We will upload q2_K shortly.**

 A single GPU with 24GB of memory can hold 4 layers of data, and num_gpu is set to 4, num_ctx is set to 2048.
+If there are 8 GPUs with 24GB of GPU memory each, num_gpu can be 32. The value of this parameter can be set to ollama.
 The specific parameters can be changed according to your own tests.
 ```
 /set parameter num_ctx 2048
 ```
+```
+/set parameter num_gpu 32
+```
 ![image/png](png/02.png)
+The three parameters above should be set one at a time, and there's no need to send both to ollama at the same time.
 6. **We will upload q2_K shortly.**