inarikami commited on
Commit
dfefd68
·
verified ·
1 Parent(s): 05471d4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -0
README.md ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DeepSeek V3 - INT4 (TensorRT-LLM)
2
+
3
+ This repository provides an INT4-quantized version of the DeepSeek V3 model, suitable for high-speed, memory-efficient inference with TensorRT-LLM.
4
+
5
+ ---
6
+ base_model:
7
+ - deepseek-ai/DeepSeek-V3
8
+ ---
9
+
10
+
11
+ Model Summary
12
+ • Base Model: DeepSeek V3 (BF16) <--- (from Nvidia FP8)
13
+ • Quantization: Weight-only INT4 (W4A16)
14
+
15
+
16
+ ```sh
17
+ python convert_checkpoint.py \
18
+ --model_dir /home/user/hf/deepseek-v3-bf16 \
19
+ --output_dir /home/user/hf/deepseek-v3-int4 \
20
+ --dtype bfloat16 \
21
+ --tp_size 4 \
22
+ --use_weight_only \
23
+ --weight_only_precision int4 \
24
+ --workers 4
25
+ ```
26
+
27
+
28
+ ### Example usage:
29
+
30
+ ```sh
31
+ trtllm-build --checkpoint_dir /DeepSeek-V3-int4-TensorRT \
32
+ --output_dir ./trtllm_engines/deepseek_v3/int4/tp4-sel4096-isl2048-bs4 \
33
+ ...
34
+ ```
35
+
36
+
37
+ ### Disclaimer:
38
+
39
+ This model is a quantized checkpoint intended for research and experimentation with high-performance inference. Use at your own risk and validate outputs for production use-cases.