DeepSeek-R1-Distill-Llama-8B-q4f16_ft-MLC
Model Configuration | |
---|---|
Source Model | deepseek-ai/DeepSeek-R1-Distill-Llama-8B |
Inference API | MLC_LLM |
Quantization | q4f16_ft |
Model Type | llama |
Vocab Size | 128256 |
Context Window Size | 131072 |
Prefill Chunk Size | 8192 |
Temperature | 0.6 |
Repetition Penalty | 1.0 |
top_p | 0.95 |
pad_token_id | 0 |
bos_token_id | 128000 |
eos_token_id | 128001 |
See jetson-ai-lab.com/models.html
for benchmarks, examples, and containers to deploy local serving and inference for these quantized models.