ValueError: The input provided to the model are wrong. The number of image tokens is 1 while the number of image given to the model is 1. This prevents correct indexing and breaks batch generation.
error with example code.
ValueError: The input provided to the model are wrong. The number of image tokens is 1 while the number of image given to the model is 1. This prevents correct indexing and breaks batch generation.
i'm using the exact demo code from the model card
I also used the demo code as is and received the above error message when I entered the wrong prompt format.
i don't understand what you're trying to say.
i used the model code from the 1.6-34b card, which we are on the community page for.
it has the system prompt built in.
are you in the right place?
Yes, I used the demo code as is and it worked fine, but I modified the prompt incorrectly and the error above occurred.
i think your issue was different. i have not modified anything. i simply copy/paste the code and execute it, and I receive the error.
i think your issue was different. i have not modified anything. i simply copy/paste the code and execute it, and I receive the error.
same problem!
and i'm using the Git version of Transformers. no difference between release version or Git main.
To compare inference speeds,I ran both the mistral 7b model and the 34b model on four v100 GPUs.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
transformers 4.40.0.dev0 requires tokenizers<0.19,>=0.14, but you have tokenizers 0.19.0 which is incompatible.
even trying to install the latest Tokenizers library (i was running 15.2) doesn't work with the latest Transformers main branch.
what a wild thing to observe, considering both projects are from the same team and rely on each other so heavily
INFO:root:Processing image: anime-summerghost-54.png, data: <PIL.PngImagePlugin.PngImageFile image mode=RGB size=1920x1080 at 0x16FC842E0>
INFO:root:Using LLaVA 1.6+ model.
INFO:root:Inputs: {'input_ids': tensor([[59603, 9334, 1397, 562, 13310, 2756, 597, 663, 15874, 10357,
14135, 98, 707, 14135, 3641, 6901, 97, 7283, 97, 597,
31081, 8476, 592, 567, 2756, 59610, 59575, 3275, 98, 2134,
1471, 59601, 59568, 64000, 144, 5697, 620, 2709, 594, 719,
2728, 100, 39965, 8898, 9129, 59601]], device='mps:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]],
device='mps:0'), 'pixel_values': tensor([[[[[ 1.3464, 1.3464, 1.3464, ..., 0.0325, 0.1201, 0.1493],
mine has 64000 in there but it still doesn't work, even though i switched the processor config to use_fast=False
Will investigate, thanks for reporting
from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration, BitsAndBytesConfig
import torch
from PIL import Image
import requests
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
)
processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-34b-hf")
model = LlavaNextForConditionalGeneration.from_pretrained("llava-hf/llava-v1.6-34b-hf", quantization_config=quantization_config, device_map="auto")
#model.to("cuda:0")
# prepare image and text prompt, using the appropriate prompt template
url = "https://github.com/haotian-liu/LLaVA/blob/1a91fc274d7c35a9b50b3cb29c4247ae5837ce39/images/llava_v1_5_radar.jpg?raw=true"
image = Image.open(requests.get(url, stream=True).raw)
prompt = "<|im_start|>system\nAnswer the questions.<|im_end|><|im_start|>user\n<image>\nWhat is shown in this image?<|im_end|><|im_start|>assistant\n"
inputs = processor(prompt, image, return_tensors="pt").to("cuda:0")
# autoregressively complete prompt
output = model.generate(**inputs, max_new_tokens=100)
print(processor.decode(output[0], skip_special_tokens=True))
@ptx0
Added quantization code for inference on multiple GPUs.
i meet the same problem
Me too.
Should be solved in the latest version of transformers. Can you confirm you still observe the bug after updating?
Also, I believe there was a similar problem when to using "mps" device, see https://github.com/huggingface/transformers/issues/30294 for details
same problem, upgrade transformers to 4.42.3 does not solve the issue
i found that the token_index of <image> in the added_tokens.json is 64003, but default image_token_index is 64000.
so i add one line to the demo code, then i worked.
inputs['input_ids'][inputs['input_ids'] == 64003] = 64000
Hi yes this is being discussed here: https://github.com/huggingface/transformers/issues/31713
Rolled back the commits to make sure it works. The updates were related to adding the chat template, which @RaushanTurganbay will take care off when she's back
I'm using "llava-hf/llava-v1.6-mistral-7b-hf" and just got rid of this error on my code. Double-check that you're always using LlavaNext where possible. I was using LlavaForConditionalGeneration instead of LlavaNextForConditionalGeneration.