language: - zh - en pipeline_tag: text-generation inference: false
原项目见 [https://huggingface.co/baichuan-inc/Baichuan-13B-Chat]
改动点:将原模型量化为8bit 保存为2GB大小的切片。
import torch from transformers import AutoModelForCausalLM, AutoTokenizer from transformers.generation.utils import GenerationConfig tokenizer = AutoTokenizer.from_pretrained("trillionmonster/Baichuan-13B-Chat-8bit", use_fast=False, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("trillionmonster/Baichuan-13B-Chat-8bit", device_map="auto", trust_remote_code=True) model.generation_config = GenerationConfig.from_pretrained("trillionmonster/Baichuan-13B-Chat-8bit") messages = [] messages.append({"role": "user", "content": "世界上第二高的山峰是哪座"}) response = model.chat(tokenizer, messages) print(response)
如需使用 int4 量化 (Similarly, to use int4 quantization):
model = AutoModelForCausalLM.from_pretrained("trillionmonster/Baichuan-13B-Chat-8bit", device_map="auto",load_in_4bit=True,trust_remote_code=True)