THUDM/glm-4-9b-chat-1m-hf · Bug: Model fails on Long-context tests that the previous version can pass.

Oct 29, 2024

To reproduce:

with transformers-4.46.1:

from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_PATH = 'THUDM/glm-4-9b-chat-1m-hf'
# MODEL_PATH = 'THUDM/glm-4-9b-chat-1m'

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, device_map="auto", torch_dtype='auto', attn_implementation='flash_attention_2', trust_remote_code=True)

with open('kv-example.txt', 'r') as f:
    message = f.read()

message = [
    {
        "role": "system",
        "content": "Answer the following question."
    },
    {
        "role": "user",
        "content": message
    }
]

inputs = tokenizer.apply_chat_template(
    message,
    return_tensors='pt',
    add_generation_prompt=True,
    return_dict=True,
).to(model.device)

input_len = inputs['input_ids'].shape[1]
generate_kwargs = {
    "input_ids": inputs['input_ids'],
    "attention_mask": inputs['attention_mask'],
    "max_new_tokens": 128,
    "do_sample": False,
}
out = model.generate(**generate_kwargs)
print(tokenizer.decode(out[0][input_len:], skip_special_tokens=True))

The results:

(glm) (base) aiscuser@node-0:/scratch/MInference/eval/multiturn_bench$ python kv-example.py 
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00,  1.00s/it]
f8f8f8b7f8b8f8f8b8f8f8f8f8f8f8f8b7b7f8f8f8f8f8b7f8f8f8f3f8f8f8f8f8f8f8f8f8f8f8f8b8f8f8f8f8f8f8f8f8f8f8f8f8f8f8f8f8f8f8f8f8f8f8f8

Downgrade to transformers==4.44.2, then use the old MODEL_PATH = 'THUDM/glm-4-9b-chat-1m'

(glm) (base) aiscuser@node-0:/scratch/MInference/eval/multiturn_bench$ python kv-example.py 
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:05<00:00,  1.88it/s]

The value associated with the specified key "6ab6ea3e-f288-4f33-ba46-7f42bb75b03f" is "cb59052b-9128-4979-9c0e-e1de4adcf73b".

@zRzRzRzRzRzRzR can you have a look at this, and see where is the problem?

liyucheng

Oct 29, 2024

The test example can be found here: https://drive.google.com/file/d/1t3Wl1PAe_2a_xGxI34mGMX6Q2PcOoTHL/view?usp=sharing

liyucheng

Oct 30, 2024

My environment:

accelerate                        1.0.1
bitsandbytes                      0.44.1
einops                            0.8.0
gradio                            5.4.0
gradio_client                     1.4.2
huggingface-hub                   0.26.2
numpy                             1.26.4
openai                            1.52.2
pillow                            10.4.0
pydantic                          2.9.2
pydantic_core                     2.23.4
sentence-transformers             3.2.1
sentencepiece                     0.2.0
sse-starlette                     2.1.3
tiktoken                          0.7.0
timm                              1.0.11
torch                             2.5.1
torchvision                       0.20.1
transformers                      4.46.1

liyucheng

Nov 17, 2024

Hi @zRzRzRzRzRzRzR , this version still doesn't seem to work on long-context test examples. Can you maybe do a sanity check to find whether there is a bug?