File size: 1,314 Bytes

cb8511b
 
4411a7c
cb8511b
c0fc8fe

---
license: mit
license_name: mit
license_link: LICENSE
library_name: transformers
tags:
    - fp8
    - vllm
language:
    - en
    - de
    - fr
    - it
    - pt
    - hi
    - es
    - th
pipeline_tag: text-generation
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
---

# DeepSeek-R1-Distill-Qwen-14B-FP8

FP8-quantized version of [DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), optimized for inference with vLLM. The quantization reduces the model's memory footprint by approximately 50%.

## Model Overview

-   **Base Model**: DeepSeek-R1-Distill-Qwen-14B
-   **Quantization**: FP8 (weights and activations)
-   **Memory Reduction**: ~50% (from 16-bit to 8-bit)
-   **License**: MIT License (following original model's license)

## Compression Details

Compressed using [LLM Compressor](https://github.com/vllm-project/llm-compressor) with:

-   512 calibration samples from UltraChat
-   Symmetric per-tensor quantization
-   Applied to linear operators within transformer blocks

The compression script is available in `compress.py`.

## Requirements

-   vLLM
-   transformers
-   torch
-   accelerate

## Note

This is an experimental compression of the model. Performance metrics and optimal usage parameters have not been thoroughly tested yet.