The code provided in the model card does not work

by AnalogAiBert - opened Jul 19, 2023

Jul 19, 2023

at least TextStreamer is not imported but thats an easy one :D, also for the snapshot download i get a 401, config.tokenizer_name does not seem to be there. Anyway if modded the script so it works with a local repo, if i understand snapshot_download it does nothing else than that. But i'm getting a strange error

The script:
`import torch
from awq.quantize.quantizer import real_quantize_model_weight
from transformers import AutoModelForCausalLM, AutoConfig, AutoTokenizer, TextStreamer
from accelerate import init_empty_weights, load_checkpoint_and_dispatch
from huggingface_hub import snapshot_download

model_name = "./meta-llama-Llama-2-13b-chat-hf-w4-g128-awq"

Config

config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)

print(config)

Tokenizer

tokenizer = AutoTokenizer.from_pretrained(model_name)
streamer = TextStreamer(tokenizer, skip_special_tokens=True)

Model

w_bit = 4
q_config = {
"zero_point": True,
"q_group_size": 128,
}

#load_quant = snapshot_download(model_name)

with init_empty_weights():
model = AutoModelForCausalLM.from_config(config=config, torch_dtype=torch.float16, trust_remote_code=True)

real_quantize_model_weight(model, w_bit=w_bit, q_config=q_config, init_only=True)

model = load_checkpoint_and_dispatch(model, model_name, device_map="balanced")

Inference

prompt = f'''What is the difference between nuclear fusion and fission?
###Response:'''

input_ids = tokenizer(prompt, return_tensors='pt').input_ids.cuda()
output = model.generate(
inputs=input_ids,
temperature=0.7,
max_new_tokens=512,
top_p=0.15,
top_k=0,
repetition_penalty=1.1,
eos_token_id=tokenizer.eos_token_id,
streamer=streamer)` the stack trace: File "/home/robert/llm/conda/script.py", line 42, in
output = model.generate(
^^^^^^^^^^^^^^^
File "/opt/conda/envs/transformer/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/transformer/lib/python3.11/site-packages/transformers/generation/utils.py", line 1538, in generate
return self.greedy_search(
^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/transformer/lib/python3.11/site-packages/transformers/generation/utils.py", line 2362, in greedy_search
outputs = self(
^^^^^
File "/opt/conda/envs/transformer/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/transformer/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 806, in forward
outputs = self.model(
^^^^^^^^^^^
File "/opt/conda/envs/transformer/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/transformer/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 693, in forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "/opt/conda/envs/transformer/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/transformer/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 408, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^^^^^^^^^^^^^^^
File "/opt/conda/envs/transformer/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/transformer/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 291, in forward
query_slices = self.q_proj.weight.split((self.num_heads * self.head_dim) // self.pretraining_tp, dim=0)
^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/transformer/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'WQLinear' object has no attribute 'weight'. Did you mean: 'qweight'?

abhinavkulkarni

Owner Jul 20, 2023

•

edited Jul 20, 2023

Hey @AnalogAiBert ,

Thanks for reporting this.

at least TextStreamer is not imported but thats an easy one :D

I have updated the repo with TextStreamer import.

also for the snapshot download i get a 401

I'll look into this.

config.tokenizer_name does not seem to be there

I've fixed this, please see the updated instructions in the model card.

But i'm getting a strange error

Please check the config.json for the appropriate version of transformers library: "transformers_version": "4.30.2"

abhinavkulkarni

Owner Jul 21, 2023

I'm closing this for now. The snapshot_download method does work most of the times, if more people complain about it, I'll look at it again.

Otherwise, I have addressed most of the issues with this post.

P.S., I just checked one AWQ uploaded model and it seems to work with the latest transformers release (transformers==4.31.0)

abhinavkulkarni changed discussion status to closed Jul 21, 2023

AnalogAiBert

Jul 24, 2023

Thanks a lot, it works now just fine. I would have answered and thanked you earlier, got rate limited :D 1 message every 48 hours thats why provided the code in Markdown ... any way thanks :D

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment