Update README.md
Browse files
README.md
CHANGED
@@ -36,7 +36,7 @@ The Model file is based on ggml-model-Q2_K.gguf.
|
|
36 |
|
37 |
A single GPU with 24GB of memory can hold 4 layers of data, and num_gpu is set to 4, num_ctx is set to 2048.
|
38 |
|
39 |
-
If there are 8 GPUs with 24GB of GPU memory each, num_gpu can be 32.
|
40 |
|
41 |
The specific parameters can be changed according to your own tests.
|
42 |
|
@@ -84,9 +84,12 @@ ollama run huihui_ai/perplexity-ai-r1:671b-q2_K
|
|
84 |
```
|
85 |
/set parameter num_ctx 2048
|
86 |
```
|
|
|
|
|
|
|
87 |

|
88 |
|
89 |
-
The
|
90 |
|
91 |
|
92 |
6. **We will upload q2_K shortly.**
|
|
|
36 |
|
37 |
A single GPU with 24GB of memory can hold 4 layers of data, and num_gpu is set to 4, num_ctx is set to 2048.
|
38 |
|
39 |
+
If there are 8 GPUs with 24GB of GPU memory each, num_gpu can be 32. The value of this parameter can be set to ollama.
|
40 |
|
41 |
The specific parameters can be changed according to your own tests.
|
42 |
|
|
|
84 |
```
|
85 |
/set parameter num_ctx 2048
|
86 |
```
|
87 |
+
```
|
88 |
+
/set parameter num_gpu 32
|
89 |
+
```
|
90 |

|
91 |
|
92 |
+
The three parameters above should be set one at a time, and there's no need to send both to ollama at the same time.
|
93 |
|
94 |
|
95 |
6. **We will upload q2_K shortly.**
|