parlance-labs
/

hc-mistral-alpaca-merged-awq

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

hc-mistral-alpaca-merged-awq / README.md

hamel's picture

Update README.md

65d0556 verified 8 months ago

|

history blame contribute delete

1 kB

	4bit AWQ Quantized Version of [parlance-labs/hc-mistral-alpaca-merged](https://huggingface.co/parlance-labs/hc-mistral-alpaca-merged)

	This is how to use [AutoAWQ](https://github.com/casper-hansen/AutoAWQ/tree/main) to quantize the model.

	```python
	from awq import AutoAWQForCausalLM
	from transformers import AutoTokenizer

	# setup
	quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }
	quant_path="hc-mistral-alpaca-merged-awq"
	model_path="parlance-labs/hc-mistral-alpaca-merged"
	model = AutoAWQForCausalLM.from_pretrained(model_path, **{"low_cpu_mem_usage": True})
	tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

	# quantize and save model
	model.quantize(tokenizer, quant_config=quant_config)
	model.save_quantized(quant_path)
	tokenizer.save_pretrained(quant_path)
	```

	After you save the model you can upload it to the hub

	```bash
	cd hc-mistral-alpaca-merged-awq
	huggingface-cli upload parlance-labs/hc-mistral-alpaca-merged-awq .
	```