bowenbaoamd commited on
Commit
c78de42
·
verified ·
1 Parent(s): 83a7860

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -31,7 +31,8 @@ python3 quantize_quark.py \
31
         --kv_cache_dtype fp8 \
32
  --num_calib_data 128 \
33
         --model_export quark_safetensors \
34
- --no_weight_matrix_merge
 
35
  # If model size is too large for single GPU, please use multi GPU instead.
36
  python3 quantize_quark.py \
37
         --model_dir $MODEL_DIR \
@@ -41,7 +42,8 @@ python3 quantize_quark.py \
41
  --num_calib_data 128 \
42
         --model_export quark_safetensors \
43
  --no_weight_matrix_merge \
44
-        --multi_gpu
 
45
  ```
46
  ## Deployment
47
  Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the vLLM backend(vLLM-compatible).
 
31
         --kv_cache_dtype fp8 \
32
  --num_calib_data 128 \
33
         --model_export quark_safetensors \
34
+ --no_weight_matrix_merge \
35
+        --custom_mode fp8
36
  # If model size is too large for single GPU, please use multi GPU instead.
37
  python3 quantize_quark.py \
38
         --model_dir $MODEL_DIR \
 
42
  --num_calib_data 128 \
43
         --model_export quark_safetensors \
44
  --no_weight_matrix_merge \
45
+        --multi_gpu \
46
+        --custom_mode fp8
47
  ```
48
  ## Deployment
49
  Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the vLLM backend(vLLM-compatible).