Some issues regarding training

#9
by zwq2018 - opened

Hello, I conduct post-pre-training based on the based model (idefics2-8b-base), but I have some questions.

  1. When I used Llava1.0 image-text pair (595k pertaining data) and idefics2-8b-base for pre-training, I found that the initial loss was very high, approximately around 6 - 7. This seems abnormal? May I ask what the loss is approximately like when you complete the pertaining for 8b based model?

  2. I found that the ignore_index seems to be different in different versions of transformers. For example, when transformers=4.40, you set image_token_id(32001) as ignore_index, while in the version 4.42 you use -100. I use the following code to set my label:

 labels[labels == self.processor.tokenizer.pad_token_id] = -100   
 labels[labels == image_token_id] =  -100  
  1. When I set the ignore label, for fake_token_around_image (32000), should it be ignored or its loss be calculated. I mean, do I need the following code:
labels[labels == fake_image_token_id] = -100
  1. Loss values can depend on many factors, you should track the performance on different tasks instead to see if there is a bug?
  2. We ignore the loss calculation on both the pad tokens and the image tokens yes.
  3. We didn't mask the loss on these tokens but you can do it, yes

Sign up or log in to comment