awq quantization method
#1
by
Nelson365487
- opened
I'm curious of what dataset that is used for teh awq qantization. Do you just follow the steps in the (https://github.com/casper-hansen/AutoAWQ/blob/main/examples/quantize.py) which is using the default dataset (mit-han-lab/pile-val-backup
https://huggingface.co/datasets/mit-han-lab/pile-val-backup)?
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
model_path = 'mistralai/Mistral-7B-Instruct-v0.2'
quant_path = 'mistral-instruct-v0.2-awq'
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }
# Load model
model = AutoAWQForCausalLM.from_pretrained(
model_path, **{"low_cpu_mem_usage": True, "use_cache": False}
)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
# Quantize
model.quantize(tokenizer, quant_config=quant_config)
# Save quantized model
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)
print(f'Model is quantized and saved at "{quant_path}"')
The reason for asking this question is that I want to quantize the lastest model yentinglin/Llama-3-Taiwan-70B-Instruct
myself.
same code but i use https://huggingface.co/datasets/yentinglin/TaiwanChat for calibration dataset
Thank you for your prompt reply, I did not expect it at all :D .
Nelson365487
changed discussion status to
closed