Update README.md
Browse files
README.md
CHANGED
@@ -269,9 +269,14 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
|
269 |
|
270 |
2. Run server
|
271 |
```bash
|
272 |
-
vllm serve openthaigpt/openthaigpt1.5-14b-instruct --tensor-parallel-size
|
|
|
|
|
|
|
|
|
|
|
|
|
273 |
```
|
274 |
-
* Note, change ``--tensor-parallel-size 2`` to the amount of available GPU cards.
|
275 |
|
276 |
3. Run inference (CURL example)
|
277 |
```bash
|
|
|
269 |
|
270 |
2. Run server
|
271 |
```bash
|
272 |
+
vllm serve openthaigpt/openthaigpt1.5-14b-instruct --tensor-parallel-size 4
|
273 |
+
```
|
274 |
+
* Note, change ``--tensor-parallel-size 4`` to the amount of available GPU cards.
|
275 |
+
|
276 |
+
If you wish to enable tool calling feature, add ``--enable-auto-tool-choice --tool-call-parser hermes`` into command. e.g.,
|
277 |
+
```bash
|
278 |
+
vllm serve openthaigpt/openthaigpt1.5-14b-instruct --tensor-parallel-size 4 --enable-auto-tool-choice --tool-call-parser hermes
|
279 |
```
|
|
|
280 |
|
281 |
3. Run inference (CURL example)
|
282 |
```bash
|