LoRA Finetuning - Text vs Vision Effects

#54

by brecker - opened Jul 16

Jul 16

•

2 questions:

Are you able to finetune only the textual aspect of Phi3V without finetuning the vision component?
-Say I want to 'retain' (I assume they will degrade with finetuning) the models vision capabilities while finetuning it for function calling etc. The multi-modality of the model is important to me
Which are the best target modules to target when performing LoRA FT on Phi3V?

2U1

Jul 17

•

Actually when using LoRA the official code shows you to fine tune only the language _model part.

I'm not exactly sure, so I'm just performing all of the layers except for the "lm_head".

Jul 17

Thank you- @2U1 Can you point me to where you refer here: 'the official code shows you to fine tune only the language _model part'

2U1

Jul 17

•

here's the link for you.
However the img_projection layer is in the vision_model part so it freeze also.

If you need to controll all three part to freeze/unfreeze , you can use my code.
https://github.com/2U1/Phi3-Vision-ft

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment