File size: 1,314 Bytes
cb8511b 4411a7c cb8511b c0fc8fe |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
---
license: mit
license_name: mit
license_link: LICENSE
library_name: transformers
tags:
- fp8
- vllm
language:
- en
- de
- fr
- it
- pt
- hi
- es
- th
pipeline_tag: text-generation
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
---
# DeepSeek-R1-Distill-Qwen-14B-FP8
FP8-quantized version of [DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), optimized for inference with vLLM. The quantization reduces the model's memory footprint by approximately 50%.
## Model Overview
- **Base Model**: DeepSeek-R1-Distill-Qwen-14B
- **Quantization**: FP8 (weights and activations)
- **Memory Reduction**: ~50% (from 16-bit to 8-bit)
- **License**: MIT License (following original model's license)
## Compression Details
Compressed using [LLM Compressor](https://github.com/vllm-project/llm-compressor) with:
- 512 calibration samples from UltraChat
- Symmetric per-tensor quantization
- Applied to linear operators within transformer blocks
The compression script is available in `compress.py`.
## Requirements
- vLLM
- transformers
- torch
- accelerate
## Note
This is an experimental compression of the model. Performance metrics and optimal usage parameters have not been thoroughly tested yet.
|