emilss's picture
Fix license name
4411a7c
metadata
license: mit
license_name: mit
license_link: LICENSE
library_name: transformers
tags:
  - fp8
  - vllm
language:
  - en
  - de
  - fr
  - it
  - pt
  - hi
  - es
  - th
pipeline_tag: text-generation
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B

DeepSeek-R1-Distill-Qwen-14B-FP8

FP8-quantized version of DeepSeek-R1-Distill-Qwen-14B, optimized for inference with vLLM. The quantization reduces the model's memory footprint by approximately 50%.

Model Overview

  • Base Model: DeepSeek-R1-Distill-Qwen-14B
  • Quantization: FP8 (weights and activations)
  • Memory Reduction: ~50% (from 16-bit to 8-bit)
  • License: MIT License (following original model's license)

Compression Details

Compressed using LLM Compressor with:

  • 512 calibration samples from UltraChat
  • Symmetric per-tensor quantization
  • Applied to linear operators within transformer blocks

The compression script is available in compress.py.

Requirements

  • vLLM
  • transformers
  • torch
  • accelerate

Note

This is an experimental compression of the model. Performance metrics and optimal usage parameters have not been thoroughly tested yet.