基于baichuan-inc/Baichuan-13B-Chat 做的GPTQ的量化,可直接加载,占用GPU约12G左右,用起来效果不错
调用代码:
import torch from transformers import AutoModelForCausalLM, AutoTokenizer from transformers.generation.utils import GenerationConfig
model_dir = 'yinfupai/Baichuan-13B-Chat-GPTQ'
tokenizer = AutoTokenizer.from_pretrained(model_dir,trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(model_dir,device_map="auto",torch_dtype=torch.float16,trust_remote_code=True)
model.generation_config = GenerationConfig.from_pretrained(model_dir) model.eval()
messages = []
#按baichuan要求的格式 messages.append({"role": "user", "content": "列举一下先天八卦的卦象"})
response = model.chat(tokenizer, messages)
print(response)
请注意模型的商用授权,请遵照baichuan-inc/Baichuan-13B-Chat的页面中的声明
- Downloads last month
- 0
Inference API (serverless) does not yet support model repos that contain custom code.