RuntimeError: shape '[1, 60, 64, 128]' is invalid for input of size 61440
I have been trying to use the example, so far I have ended up with the following error
File ~/anaconda3/envs/triton/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py:261 in forward
key_states = self.k_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
RuntimeError: shape '[1, 60, 64, 128]' is invalid for input of size 61440
The issue is generally with the transformers version. You will need transformers>=4.31.0 to make this work.
Thanks. Seemed to be the problem
How to slove it
How to slove it
The issue is generally with the transformers version. You will need transformers>=4.31.0 to make this work.
I upgrade transformer 4.31.0 ,but didn't slove
and one strange problem , 7b or 13b can work ,but 70B failed
have the same issue with the 70B version of models
You also need python>=3.8 to address this issue.
Same issue (but on Llama-3-8B model)
python=3.9 and transformers==4.41.0 don't work :/
Any Solution ?
model: 'meta-llama/Meta-Llama-3-8B-Instruct'
using Tesla K8
Cuda 11.6 Nvidia 470 drivers
pip install torch==1.12.0+cu116 torchvision==0.13.0+cu116 torchaudio==0.12.0+cu116 -f https://download.pytorch.org/whl/cu116/torch_stable.html
pip install -r requirements.txt
requirements.txt:
transformers==4.31.0 # For working with Meta LLaMA and BitsAndBytesConfig
accelerate==0.21.0 # For multi-GPU handling and model acceleration
bitsandbytes==0.38.1 # For 8-bit quantization
scipy==1.9.3
It works fine.