When load_in_8bit=True, the chat becomes VERY VERY SLOW and returns nothing
#53
by
leoyangsw
- opened
checkpoint = "THUDM/chatglm-6b"
model = AutoModel.from_pretrained(checkpoint, torch_dtype=torch.float16, device_map="auto", load_in_8bit=True, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(checkpoint, trust_remote_code=True)
history = []
while True:
query = input("Man:\n").strip()
response, history = model.chat(tokenizer, query, history=history) ### VERY VERY SLOW AND RETURN NOTHING
print("\nBot:\n" + response)
I have the same problem, has it been solved?
I meet the same problem, sooooo slow and retrun none,have you sloved?