Bug: Model fails on Long-context tests that the previous version can pass.
#1
by
liyucheng
- opened
To reproduce:
with transformers-4.46.1
:
from transformers import AutoModelForCausalLM, AutoTokenizer
MODEL_PATH = 'THUDM/glm-4-9b-chat-1m-hf'
# MODEL_PATH = 'THUDM/glm-4-9b-chat-1m'
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, device_map="auto", torch_dtype='auto', attn_implementation='flash_attention_2', trust_remote_code=True)
with open('kv-example.txt', 'r') as f:
message = f.read()
message = [
{
"role": "system",
"content": "Answer the following question."
},
{
"role": "user",
"content": message
}
]
inputs = tokenizer.apply_chat_template(
message,
return_tensors='pt',
add_generation_prompt=True,
return_dict=True,
).to(model.device)
input_len = inputs['input_ids'].shape[1]
generate_kwargs = {
"input_ids": inputs['input_ids'],
"attention_mask": inputs['attention_mask'],
"max_new_tokens": 128,
"do_sample": False,
}
out = model.generate(**generate_kwargs)
print(tokenizer.decode(out[0][input_len:], skip_special_tokens=True))
The results:
(glm) (base) aiscuser@node-0:/scratch/MInference/eval/multiturn_bench$ python kv-example.py
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00, 1.00s/it]
f8f8f8b7f8b8f8f8b8f8f8f8f8f8f8f8b7b7f8f8f8f8f8b7f8f8f8f3f8f8f8f8f8f8f8f8f8f8f8f8b8f8f8f8f8f8f8f8f8f8f8f8f8f8f8f8f8f8f8f8f8f8f8f8
Downgrade to transformers==4.44.2
, then use the old MODEL_PATH = 'THUDM/glm-4-9b-chat-1m'
(glm) (base) aiscuser@node-0:/scratch/MInference/eval/multiturn_bench$ python kv-example.py
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:05<00:00, 1.88it/s]
The value associated with the specified key "6ab6ea3e-f288-4f33-ba46-7f42bb75b03f" is "cb59052b-9128-4979-9c0e-e1de4adcf73b".
@zRzRzRzRzRzRzR can you have a look at this, and see where is the problem?
The test example can be found here: https://drive.google.com/file/d/1t3Wl1PAe_2a_xGxI34mGMX6Q2PcOoTHL/view?usp=sharing
My environment:
accelerate 1.0.1
bitsandbytes 0.44.1
einops 0.8.0
gradio 5.4.0
gradio_client 1.4.2
huggingface-hub 0.26.2
numpy 1.26.4
openai 1.52.2
pillow 10.4.0
pydantic 2.9.2
pydantic_core 2.23.4
sentence-transformers 3.2.1
sentencepiece 0.2.0
sse-starlette 2.1.3
tiktoken 0.7.0
timm 1.0.11
torch 2.5.1
torchvision 0.20.1
transformers 4.46.1