Magnum-v4-123b HQQ

This repo contains magnum-v4-123b quantized to 4-bit precision using HQQ.

HQQ provides a similar level of precision to AWQ at 4-bit, but with no need for calibration.

This quant was generated using 8xA40s within only 10 minutes.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, HqqConfig


model_path = "anthracite-org/magnum-v4-123b"
quant_config = HqqConfig(nbits=4, group_size=128, axis=1)

model = AutoModelForCausalLM.from_pretrained(model_path,
                                             torch_dtype=torch.float16,
                                             cache_dir='.',
                                             device_map="cuda:0",
                                             quantization_config=quant_config,
                                             low_cpu_mem_usage=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

output_path = "magnum-v4-123b-hqq-4bit"
model.save_pretrained(output_path)
tokenizer.save_pretrained(output_path)

Inference

You can perform inference directly with transformers, or using aphrodite:

pip install aphrodite-engine

aphrodite run alpindale/magnum-v4-123b-hqq-4bit -tp 2
Downloads last month
15
Safetensors
Model size
63.6B params
Tensor type
I64
FP16
U8
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.

Model tree for alpindale/magnum-v4-123b-hqq-4bit

Quantized
(7)
this model