kobkrit commited on
Commit
566ea87
·
verified ·
1 Parent(s): 5bcfaac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -2
README.md CHANGED
@@ -269,9 +269,14 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
269
 
270
  2. Run server
271
  ```bash
272
- vllm serve openthaigpt/openthaigpt1.5-14b-instruct --tensor-parallel-size 2
 
 
 
 
 
 
273
  ```
274
- * Note, change ``--tensor-parallel-size 2`` to the amount of available GPU cards.
275
 
276
  3. Run inference (CURL example)
277
  ```bash
 
269
 
270
  2. Run server
271
  ```bash
272
+ vllm serve openthaigpt/openthaigpt1.5-14b-instruct --tensor-parallel-size 4
273
+ ```
274
+ * Note, change ``--tensor-parallel-size 4`` to the amount of available GPU cards.
275
+
276
+ If you wish to enable tool calling feature, add ``--enable-auto-tool-choice --tool-call-parser hermes`` into command. e.g.,
277
+ ```bash
278
+ vllm serve openthaigpt/openthaigpt1.5-14b-instruct --tensor-parallel-size 4 --enable-auto-tool-choice --tool-call-parser hermes
279
  ```
 
280
 
281
  3. Run inference (CURL example)
282
  ```bash