Molmo-7B-GPTQ-4bit 馃殌
Overview
The Molmo-7B-GPTQ-4bit model is a transformer-based model fine-tuned for NLP tasks. It has been quantized to 4-bit precision for efficient deployment. This model has been prepared using bitsandbytes for 4-bit quantization rather than using AutoGPTQ, which does not natively support this model format as of now. The quantization leverages the BitsAndBytesConfig
from the transformers
library, enabling highly optimized GPU inference with reduced memory usage.
Model Information
- Model Name: Molmo-7B-GPTQ-4bit
- Base Model: allenai/Molmo-7B-D-0924
- Quantization: 4-bit quantization using
bitsandbytes
instead ofAutoGPTQ
- Repository URL: zamal/Molmo-7B-GPTQ-4bit
Technical Details
This model is quantized using bitsandbytes (not AutoGPTQ), as GPTQ currently lacks direct support for NF4 4-bit quantization via the native AutoGPTQ
methods. This approach allows for highly efficient 4-bit precision inference with minimal loss in performance and reduced memory overhead.
Key Quantization Configurations:
- bnb_4bit_use_double_quant: Enabled, for more efficient handling of smaller models.
- bnb_4bit_quant_type: NF4 (Normal Float 4-bit), which is more efficient and accurate for smaller models.
- bnb_4bit_compute_dtype: FP16 (float16) to accelerate GPU-based inference.
Device Compatibility:
- bitsandbytes automatically handles device mapping for GPUs via the
device_map="auto"
parameter. - 4-bit models are ideal for GPUs with limited VRAM, allowing inference on larger models without exceeding hardware memory limits.
Limitations
- Precision Loss: While the model has been quantized for efficiency, there is a minor trade-off in precision due to the 4-bit quantization, which may slightly affect performance compared to the original full-precision model.
- AutoGPTQ Limitation: As mentioned, AutoGPTQ does not natively support this kind of quantization, and this has been achieved through
bitsandbytes
and thetransformers
library.
Usage
Installation
Make sure you have the necessary dependencies installed:
pip install transformers torch Pillow torchvision einops accelerate tensorflow bitsandbytes
- Downloads last month
- 486