archie1983

Jul 19, 2024

Can you please give me a little leg up here? The PR for Chameleon is now merged in Transformers and I got the merged version and compiled and it all seems to work without errors. The only problem is that it doesn't give me the result. Instead of an answer it's just giving me back the question I asked. Have you got any idea please? Yes I am relatively new to all of this and I'm using code that I found in Transformers examples. I'm sure I'm just being stupid somewhere, but I'd appreciate some help please. The code I use is this:

[CODE]
from transformers import ChameleonProcessor, ChameleonForCausalLM
import torch
from PIL import Image
import requests

processor = ChameleonProcessor.from_pretrained("facebook/chameleon-7b")
model = ChameleonForCausalLM.from_pretrained("facebook/chameleon-7b")

prompt = "What color is the belt in this image?"
image_url = "scene_pics/jiu_jitsu_belt_white_1.jpg"
image = Image.open(image_url)

inputs = processor(prompt, image, return_tensors="pt").to(model.device)

autoregressively complete prompt

output = model.generate(**inputs, max_new_tokens=50)
print("aout : " + str(len(output)))
print(processor.decode(output[0], skip_special_tokens=True))
[/CODE]

And what I get back is this:

[CODE]
/mount/ae_cvm/transformers/src/transformers/utils/hub.py:127: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
Some kwargs in processor config are unused and will not have any effect: image_seq_length, image_token.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:20<00:00, 6.98s/it]
Setting pad_token_id to eos_token_id:2 for open-end generation.
aout : 1
What color is the belt in this image?
[/CODE]

nielsr

Jul 19, 2024

Pinging @RaushanTurganbay here

RaushanTurganbay

Jul 19, 2024

@archie1983 Can you verify that you're running exact same code as here?

interesting that you didn't get error when generating with that prompt, because you have to add a special "" token to the prompr in the place where image should be. Like "What color is the belt in this image?".

If you already did that, can you try loading with fp16, by adding "torch_dtype=torch.float16" to the model loading code? My guess is that full precision doesn't work good when weights are in bf16. Note, that loading in bf16 right now seems to be failing, that's why I recommended fp16

archie1983

Jul 19, 2024

Thanks for this, @RaushanTurganbay . Yes, I have the special image token in there, it's just that this discussion forum doesn't display code very well. The reason why I tried to run in fp16 was because when I run it in bf16 like you showed, then I get this problem:

[CODE]
/mount/ae_cvm/transformers/src/transformers/utils/hub.py:127: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
Some kwargs in processor config are unused and will not have any effect: image_seq_length, image_token.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:11<00:00, 3.85s/it]
Setting pad_token_id to eos_token_id:2 for open-end generation.
Traceback (most recent call last):
File "test2.py", line 36, in
output = model.generate(**inputs, max_new_tokens=50)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/mount/ae_cvm/transformers/src/transformers/generation/utils.py", line 1969, in generate
result = self._sample(
File "/mount/ae_cvm/transformers/src/transformers/generation/utils.py", line 2912, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mount/ae_cvm/transformers/src/transformers/models/chameleon/modeling_chameleon.py", line 1531, in forward
outputs = self.model(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mount/ae_cvm/transformers/src/transformers/models/chameleon/modeling_chameleon.py", line 1280, in forward
image_tokens = self.get_image_tokens(pixel_values)
File "/mount/ae_cvm/transformers/src/transformers/models/chameleon/modeling_chameleon.py", line 1230, in get_image_tokens
_, _, image_toks = self.vqmodel.encode(pixel_values)
File "/mount/ae_cvm/transformers/src/transformers/models/chameleon/modeling_chameleon.py", line 1014, in encode
hidden_states = self.encoder(pixel_values)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mount/ae_cvm/transformers/src/transformers/models/chameleon/modeling_chameleon.py", line 939, in forward
hidden_states = [self.conv_in(pixel_values)]
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (float) and bias type (c10::Half) should be the same
[/CODE]

Any ideas on how I can get the bias type to match? Thanks.

RaushanTurganbay

Jul 19, 2024

@archie1983 the issue with dtype was mixed in main branch, you can pull the changes by !pip install --upgrade git+https://github.com/huggingface/transformers.git

I cannot say for sure why the model is not replying, I made same question with a generic image from the web and got a reply. See code below.

I can recommend to verify that:

the model is stopping generation because it has generated an 'eos' token
experiment with other prompts and check how often you see this pattern. there was an issue when model explicitly refused to reply because of safety alignment
use these params in generation do_sample=True, temperature=0.7, top_p=0.9, repetition_penalty=1.2`, the model yet doesn't have it. I made a PR to this repo and waiting for merge
share the image you used as HF dataset/web link so I can also try to see, if nothing works and it fails to generate way too often

from transformers import ChameleonProcessor, ChameleonForConditionalGeneration
import torch
import requests
from PIL import Image

model = ChameleonForConditionalGeneration.from_pretrained("facebook/chameleon-7b", torch_dtype=torch.bfloat16, device_map="cuda:0")
processor = ChameleonProcessor.from_pretrained("facebook/chameleon-7b")

prompt = "What color is the belt in this image?<image>"
image = Image.open(requests.get("https://fujisports.com/cdn/shop/products/FUJIADULTBJJBELTS_0003s_0000_fuji__bjjadultbelt__beltbjjwhite__white__1_1_1_3_1600x1600.jpg?v=1644942404", stream=True).raw)

inputs = processor(prompt, images=image, return_tensors="pt").to(model.device, torch.bfloat16)

generated_ids = model.generate(**inputs, max_new_tokens=100) 
out = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(out)

archie1983

Jul 20, 2024

Awesome, @RaushanTurganbay , I pulled the latest commits for Transformers and reinstalled and then ran your code. That worked. this is the result I got:

mount/ae_cvm/transformers/src/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:09<00:00,  3.15s/it]
Some kwargs in processor config are unused and will not have any effect: image_seq_length, image_token. 
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
What color is the belt in this image?The belt in this image is white.

It wasn't quick though. The whole inference took like 4 hours, but it did get there in the end. I am using Nvidia Jetson Xavier AGX for this, so my PyTorch is a special one- built by Nvidia and available through their docker container: l4t-text-generation. That's probably also the reason why I had to slightly change your code and take away the device_map parameter where you're loading the model. With that parameter I was getting this:

Traceback (most recent call last):
  File "test3.py", line 18, in <module>
    generated_ids = model.generate(**inputs, max_new_tokens=100) 
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mount/ae_cvm/transformers/src/transformers/generation/utils.py", line 1980, in generate
    result = self._sample(
  File "/mount/ae_cvm/transformers/src/transformers/generation/utils.py", line 2923, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mount/ae_cvm/transformers/src/transformers/models/chameleon/modeling_chameleon.py", line 1533, in forward
    outputs = self.model(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mount/ae_cvm/transformers/src/transformers/models/chameleon/modeling_chameleon.py", line 1298, in forward
    causal_mask = self._update_causal_mask(
  File "/mount/ae_cvm/transformers/src/transformers/models/chameleon/modeling_chameleon.py", line 1421, in _update_causal_mask
    causal_mask = torch.triu(causal_mask, diagonal=1)
RuntimeError: "triu_tril_cuda_template" not implemented for 'BFloat16'

Anyway, I'm glad that it worked. Now it would be great to speed it up. Any chance for a quantized version?

Thanks

RaushanTurganbay

Jul 20, 2024

@archie1983 glad that it worked. It is super slow because you're running on CPU by removing device-map. I guess the error you saw is related to the torch version installed, can you try updating it?

archie1983

Jul 21, 2024

@RaushanTurganbay Torch in my case comes from Nvidia- they build and maintain what works on Jetson. The latest torch they had available for my hardware (Jetson AGX Xavier) and my Jetpack (5.1.2) was torch 2.1.0. This was still giving the same problem. But I did find a solution. Posting it here for posterity really so that if anybody has this problem in future, then there's this solution:

https://github.com/meta-llama/llama3/issues/110#issuecomment-2178078985

After using their method #1 my inference now runs in mere seconds.

Thanks

Pollecode

Jul 22, 2024

can I run this on my CPU device?

RaushanTurganbay

Jul 22, 2024

@Pollecode yes, you can run on CPU by simply removing "device_map" when loading the model. But the generation will be very slow, 4h from the above comments

In case you cant fit the model in fp16 in your GPU, try out quantization (https://huggingface.co/docs/peft/en/developer_guides/quantization#quantize-a-model)

Pollecode

Jul 22, 2024

Thanks

archie1983 changed discussion status to closed Jul 22, 2024

facebook
/

chameleon-7b

A little leg-up for getting it running please?

autoregressively complete prompt