Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

4bit AWQ Quantized Version of parlance-labs/hc-mistral-alpaca-merged

This is how to use AutoAWQ to quantize the model.

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

# setup
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }
quant_path="hc-mistral-alpaca-merged-awq"
model_path="parlance-labs/hc-mistral-alpaca-merged"
model = AutoAWQForCausalLM.from_pretrained(model_path, **{"low_cpu_mem_usage": True})
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# quantize and save model
model.quantize(tokenizer, quant_config=quant_config)
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)

After you save the model you can upload it to the hub

cd hc-mistral-alpaca-merged-awq
huggingface-cli upload parlance-labs/hc-mistral-alpaca-merged-awq .
Downloads last month
6
Safetensors
Model size
1.2B params
Tensor type
I32
·
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.